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RECOMBINANT OLEANDOLIDE POLYKETIDE SYNTHASE 



Field of the Invention 
The present invention provides recombinant methods and materials for 
5 producing polyketides by recombinant DNA technology. The invention relates to the 
fields of agriculture, animal husbandry, chemistry, medicinal chemistry, medicine, 
molecular biology, pharmacology, and veterinary technology. 

Background of the Invention 

10 Polyketides represent a large family of diverse compounds synthesized from 

2-carbon units through a series of condensations and subsequent modifications. 
Polyketides occur in many types of organisms, including fungi and mycelial bacteria, 
in particular, the actinomycetes. There are a wide variety of polyketide structures, and 
the class of polyketides encompasses numerous compounds with diverse activities. 

15 Erythromycin, FK-506, FK-520, narbomycin, oleandomycin, picromycin, rapamycin, 
spinocyn, and tylosin are examples of such compounds. Given the difficulty in 
producing polyketide compounds by traditional chemical methodology, and the 
typically low production of polyketides in wild-type cells, there has been considerable 
interest in finding improved or alternate means to produce polyketide compounds. See 

20 PCT publication Nos. WO 93/13663; WO 95/08548; WO 96/40968; 97/02358; and 
98/27203; United States Patent Nos. 4,874,748; 5,063,155; 5,098,837; 5,149,639; 
5,672,491; and 5,712,146; Fu et al, 1994, Biochemistry 33: 9321-9326; McDaniel et 
al. y 1993, Science 262: 1546-1550; and Rohr, \995, Angew. Chem. Int. Ed. Engl 
34(8): 881-888, each of which is incorporated herein by reference. 

25 Polyketides are synthesized in nature by polyketide synthase (PKS) enzymes. 

These enzymes, which are complexes of multiple large proteins, are similar to the 
synthases that catalyze condensation of 2-carbon units in the biosynthesis of fatty 
acids. Two major types of PKS enzymes are known; these differ in their composition 
and mode of synthesis. These two major types of PKS enzymes are commonly 

30 referred to as Type I or "modular" and Type II "iterative" PKS enzymes. 

Modular PKSs are responsible for producing a large number of 12-, 14-, and 
16-membered macrolide antibiotics including erythromycin, methymycin, 
narbomycin, oleandomycin, picromycin, and tylosin. Modular PKS enzymes for 14- 
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membered polyketides are encoded by PKS genes that often consist of three or more 
open reading frames (ORFs). Each ORF of a modular PKS can comprise one, two, or 
more "modules" of ketosynthase activity, each module of which consists of at least 
two (if a loading module) and more typically three (for the simplest extender module) 
5 or more enzymatic activities or "domains." These large multifunctional enzymes 
(>300,000 kDa) catalyze the biosynthesis of polyketide macrolactones through 
multistep pathways involving decarboxylase condensations between acyl thioesters 
followed by cycles of varying B-carbon processing activities (see O'Hagan, D. The 
polyketide metabolites] E. Horwood: New York, 1991, incorporated herein by 
10 reference). 

During the past half decade, the study of modular PKS function and specificity 
has been greatly facilitated by the plasmid-based Streptomyces coelicolor expression 
system developed with the 6-deoxyerythronolide B (6-dEB) synthase (DEBS) genes 
(see Kao et al % 1994, Science, 265: 509-512, McDaniel et ah, 1993, Science 262: 

15 1546-1557, and U.S. Patent Nos, 5,672,491 and 5,712,146, each of which is 
incorporated herein by reference). The advantages to this plasmid-based genetic 
system for DEBS are that it overcomes the tedious and limited techniques for 
manipulating the natural DEBS host organism, Saccharopolyspora erythraea, allows 
more facile construction of recombinant PKSs, and reduces the complexity of PKS 

20 analysis by providing a "clean" host background. This system also expedited 

construction of the first combinatorial modular polyketide library in Streptomyces 
(see PCT publication No. WO 98/49315, incorporated herein by reference). 

The ability to control aspects of polyketide biosynthesis, such as monomer 
selection and degree of 6-carbon processing, by genetic manipulation of PKSs has 

25 stimulated great interest in the combinatprial engineering of novel antibiotics (see 

Hutchinson, 1998, Curr. Opin. Microbiol 1: 319-329; Carreras and Santi, 1998, Curr. 
Opin. Biotech. 9: 403-41 1; and U.S. Patent Nos. 5,712,146 and 5,672,491, each of 
which is incorporated herein by reference). This interest has resulted in the cloning, 
analysis, and manipulation by recombinant DNA technology of genes that encode 

30 PKS enzymes. The resulting technology allows one to manipulate a known PKS gene 
cluster either to produce the polyketide synthesized by that PKS at higher levels than 
occur in nature or in hosts that otherwise do not produce the polyketide. The 
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technology also allows one to produce molecules that are structurally related to, but 
distinct from, the polyketides produced from known PKS gene clusters. 

Oleandomycin is an antibacterial polyketide (described in U.S. Patent No. 
2,757,123, incorporated herein by reference) produced by a modular PKS in 
5 Streptomyces antibioticus. Oleandomycin has the structure shown below, with the 
conventional numbering scheme and stereochemical representation. 



As is the case for certain other macrolide antibiotics, the macrolide product of the 
PKS , 8,8a-deoxyoleandoIide, also referred to herein simply as oleandolide (although 

10 oleandolide in other contexts refers to the epoxidated aglycone), is further modified 
by epoxidation (at C-8 and C-8a) and glycosylation (an oleandrose at C-3 and a 
desosamine at C-5) to yield oleandomycin. 

The reference Swan et aL, 1994, entitled "Characterisation of a Streptomyces 
antibioticus gene encoding a type I polyketide synthase which has an unusual coding 

15 sequence," MoL Gen. GeneL 242: 358-362, incorporated herein by reference, 
describes the DNA sequence of the coding region of a gene designated ORFB 
hypothesized to encode modules 5 and 6 and a fragment of a gene designated ORFA 
hypothesized to contain the ACP domain of module 4 of the oleandolide PKS. The 
reference Quiros et aL, 1998, entitled "Two glycosyltransferases and a glycosidase are 

20 involved in oleandomycin modification during its biosynthesis by Streptomyces 
antibioticus," MoL Microbiol. 28(6): 1 177-1 1 85, incorporated herein by reference, 
describes genes and gene products involved in oleandomycin modification during its 
biosynthesis. In particular, the reference describes a glycosyltransferase involved in 
rendering oleandomycin non-toxic to the producer cell and a glycosidase that 

25 reactivates oleandomycin after the glycosylated form is excreted from the cell. See 
also Olano et aL, Aug. 1998, "Analysis of a Streptomyces antibioticus chromosomal 
region involved in oleandomycin biosynthesis, which encodes two 
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glycosyltransferases responsible for glycosylation of the macrolactone ring, Mol Gen. 
Genet. 259(3): 299-308, and PCT patent publication No. 99/05283, incorporated 
herein by reference. While a number of semi-synthetic oleandomycin derivatives have 
been described, see U.S. Patent Nos. 4,085,119; 4,090,017; 4,125,705; 4,133,950; 
5 4,140,848; 4,166,901; 4,336,368; and 5,268,462, incorporated herein by reference, the 
number and diversity of such derivatives have been limited due to the inability to 
manipulate the PKS genes. 

Genetic systems that allow rapid engineering of the oleandolide PKS would be 
valuable for creating novel compounds for pharmaceutical, agricultural, and 
10 veterinary applications. The production of such compounds could be accomplished if 
the heterologous expression of the oleandolide PKS in Streptomyces coelicolor and £ 
lividans and other host cells were possible. The present invention meets these and 
other needs. 

15 Summary of the Invention 

The present invention provides recombinant methods and materials for 
expressing PKS enzymes derived in whole and in part from the oleandolide PKS in 
recombinant host cells. The invention also provides the polyketides produced by such 
PKS enzymes. The invention provides in recombinant form all of the genes for the 

20 proteins that constitute the complete PKS that ultimately results, in 

Streptomyces antibioticus, in the production of oleandolide, which is further 
glycosylated and epoxidated to form oleandomycin. Thus, in one embodiment, the 
invention is directed to recombinant materials comprising nucleic acids with 
nucleotide sequences encoding at least one domain, module, or protein encoded by an 

25 oleandolide PKS gene. In one preferred embodiment of the invention, the DNA 

compounds of the invention comprise a coding sequence for at least one and 

*■ 

preferably two or more of the domains of the loading module and extender modules 1 
through 4, inclusive, of 8,8a-deoxyoleandolide synthase. 

In one embodiment, the invention provides a recombinant expression vector 
30 that comprises a heterologous promoter positioned to drive expression of one or more 
of the oleandolide PKS genes. In a preferred embodiment, the promoter is derived 
from another PKS gene. In a related embodiment, the invention provides recombinant 
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host cells comprising the vector that produces oleandolide. In a preferred 
embodiment, the host cell is Streptomyces lividans or S. coelicolor. 

In another embodiment, the invention provides a recombinant expression 
vector that comprises a promoter positioned to drive expression of a hybrid PKS 
5 comprising all or part of the oleandolide PKS and at least a part of a second PKS. In a 
related embodiment, the invention provides recombinant host cells comprising the 
vector that produces the hybrid PKS and its corresponding polyketide. In a preferred 
embodiment, the host cell is Streptomyces lividans or 5. coelicolor. 

In a related embodiment, the invention provides recombinant materials for the 

1 0 production of libraries of polyketides wherein the polyketide members of the library 
are synthesized by hybrid PKS enzymes of the invention. The resulting polyketides 
can be further modified to convert them to other useful compounds, such as 
antibiotics, typically through hydroxylation and/or glycosylation. Modified 
macrolides provided by the invention that are useful intermediates in the preparation 

1 5 of antibiotics are of particular benefit. 

In another related embodiment, the invention provides a method to prepare a 
nucleic acid that encodes a modified PKS, which method comprises using the 
oleandolide PKS encoding sequence as a scaffold and modifying the portions of the 
nucleotide sequence that encode enzymatic activities, either by mutagenesis, 

20 inactivation, deletion, insertion, or replacement. The thus modified oleandolide PKS 
encoding nucleotide sequence can then be expressed in a suitable host cell and the cell 
employed to produce a polyketide different from that produced by the oleandolide 
PKS. In addition, portions of the oleandolide PKS coding sequence can be inserted 
into other PKS coding sequences to modify the products thereof. 

25 In another related embodiment, the invention is directed to a multiplicity of 

cell colonies, constituting a library of colonies, wherein each colony of the library 
contains an expression vector for the production of a modular PKS derived in whole 
or in part from the oleandolide PKS. Thus, at least a portion of the modular PKS is 
identical to that found in the PKS that produces oleandolide and is identifiable as 

30 such. The derived portion can be prepared synthetically or directly from DNA derived 
from organisms that produce oleandolide. In addition, the invention provides methods 
to screen the resulting polyketide and antibiotic libraries. 
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The invention also provides novel polyketides, motilides, antibiotics, and other 
useful compounds derived therefrom, The compounds of the invention can also be 
used in the manufacture of another compound. In a preferred embodiment, the 
compounds of the invention are formulated as antibiotics in a mixture or solution for 
5 administration to an animal or human. 

These and other embodiments of the invention are described in more detail in 
the following description, the examples, and claims set forth below. 

Brief Description of the Figures 

10 Figure 1 shows restriction site and function maps of the insert DNA in 

cosmids pKOS055- 1 and pKOS055-5 of the invention. Various restriction sites (Xhol y 
Clal, £coRI) are also shown. Italicized restriction sites in the Figure indicate that not 
all of such sites are shown; the EcoRI sites shown are derived from the cosmid DNA 
into which the PKS gene segments were inserted. The location of the coding 

15 sequences for modules 1 - 6 of oleandolide PKS is indicated by brackets with labels 
underneath the brackets (i.e., mod. 2 is module 2). The sizes (in kilobase (kb) pairs) of 
various portions of the inserts are also shown. The open reading frames for the oleAI 
(oleAl), oleAII (oleAI), and oleAIII \oleA3) genes are shown as arrows pointing in the 
direction of transcription. 

20 Figure 2 shows a function map of the oleandomycin gene cluster. In the top 

half of the Figure, the various open reading frames of the genes (olel oleN2, oleR, 
oleAI, etc.) are shown as arrows pointing in the direction of transcription. Directly 
beneath, a line indicates the size in base pairs (bp) of the gene cluster. The bar with 
alphanumeric identifiers under the size indicator line references Genbank accession 

25 numbers providing the nucleotide sequence of the indicated region, which sequence 
information is incorporated herein by reference. The cross-hatched portion of this bar 
indicates the region of the gene cluster for which sequence information is provided 
herein. In the bottom half of the Figure, the oleandolide PKS proteins are shown as 
arrow bars, with the location of the modules of the PKS shown below, and with the 

30 various domains of the modules shown below the modules. 

Figure 3 shows a restriction site and function map of plasmid pKOS039-l 10, 
described in Example 3, below, which is an expression vector that can integrate 
(phiC31 based attachment and integration functions) into the chromosome of 



WO 00/26349 - 7 - PCT/US99/24478 

Streptomyces and other host cells and contains the ermE* promoter positioned to 
drive expression of the oleAI gene. 

Figure 4 shows a restriction site and function map of plasmid pKOS039-130, 
described in Example 4, below, which is an expression vector that replicates (SCP2* 
5 origin of replication) in Streptomyces host cells and contains the actl promoter and 
actII-ORF4 activator positioned to drive expression of the oleAI, oleAII, and oleAIII 
genes. 

Figure 5 shows a restriction site and function map of plasmid pKOS039-133, 
described in Example 5, below, which is an expression vector that can integrate 
10 (phiC3 1 based attachment and integration functions) into the chromosome of 
Streptomyces and other host cells and contains the actl promoter and actII~ORF4 
activator positioned to drive expression of the oleAIII gene. 

Detailed Description of the Invention 

15 The present invention provides useful compounds and methods for producing 

polyketides in recombinant host cells. As used herein, the term recombinant refers to 
a compound or composition produced by human intervention. The invention provides 
recombinant DNA compounds encoding all or a portion of the oleandolide PKS. The 
invention provides recombinant expression vectors useful in producing the 

20 oleandolide PKS and hybrid PKSs composed of a portion of the oleandolide PKS in 
recombinant host cells. The invention provides the polyketides produced by the 
recombinant PKS as well as those derived therefrom by chemical processes and/or by 
treatment with polyketide modification enzymes. 

To appreciate the many and diverse benefits and applications of the invention, 

25 the description of the invention below is organized as follows. In Section I, the 

recombinant oleandolide PKS provided by the invention is described. In Section II, 
methods for heterologous expression of the oleandolide PKS and oleandolide 
modification enzymes provided by the invention are described. In Section HI, the 
hybrid PKS genes provided by the invention and the polyketides produced thereby are 

30 described. In Section IV, the polyketide compounds provided by the invention and 
pharmaceutical compositions of those compounds are described. The detailed 
description is followed by a variety of working examples illustrating the invention. 
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The oleandolide synthase gene, like other PKS genes, is composed of coding 
sequences organized in a loading module, a number of extender modules, and a 
thioesterase domain. As described more fully below, each of these domains and 
modules is a polypeptide with one or more specific functions. Generally, the loading 
5 module is responsible for binding the first building block used to synthesize the 

polyketide and transferring it to the first extender module. The building blocks used to 
form complex polyketides are typically acylthioesters, most commonly acetyl, 
propionyl, malonyl, 2-hydroxymalonyl, 2-methylmalonyl, and 2-ethylmalonyl CoA. 
Other building blocks include amino acid like acylthioesters. PKSs catalyze the 

1 0 biosynthesis of polyketides through repeated, decarboxylative Claisen condensations 
between the acylthioester building blocks. Each module is responsible for binding a 
building block, performing one or more functions, and transferring the resulting 
compound to the next module. The next module, in turn, is responsible for attaching 
the next building block and transferring the growing compound to the next module 

15 until synthesis is complete. At that point, an enzymatic thioesterase activity cleaves 
the polyketide from the PKS. 

Such modular organization is characteristic of the class of PKS enzymes that 
synthesize complex polyketides and is well known in the art. The polyketide known 
as 6-deoxyerythronolide B (6-dEB) is a classic example of this type of complex 

20 polyketide. The genes, known as eryAI, eryAII, and eryAIII (also referred to herein as 
the DEBS genes, for the proteins, known as DEBS1, DEBS2, and DEBS3, that 
comprise the 6-dEB synthase), that code for the multi-subunit protein known as 
DEBS that synthesizes 6-dEB are described in U.S. Patent No. 5,824,513, 
incorporated herein by reference. Recombinant methods for manipulating modular 

25 PKS genes are described in U.S. Patent Nos. 5,672,491 ; 5,843,718; 5,830,750; and 
5,712,146; and in PCT publication Nos. 98/49315 and 97/02358, each of which is 
incorporated herein by reference. 

The loading module of DEBS consists of two domains, an acyl-transferase 
(AT) domain and an acyl carrier protein (ACP) domain. Each extender module of 

30 DEBS, like those of other modular PKS enzymes, contains a ketosynthase (KS), AT, 
and ACP domains, and zero, one, two, or three domains for enzymatic activities that 
modify the beta-carbon of the growing polyketide chain. A module can also contain 
domains for other enzymatic activities, such as, for example, a methyltransferase 
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activity. Finally, the releasing domain contains a thioesterase and, often, a cyclase 
activity. 

The AT domain of the loading module recognizes a particular acyl-CoA (for 
DEBS this is usually propionyl but sometimes butyryl or acetyl) and transfers it as a 
5 thiol ester to the ACP of the loading module. Concurrently, the AT on each of the 
extender modules recognizes a particular extender-CoA (malonyl or alpha-substituted 
malonyl, i.e., methylmalonyl, ethylmalonyl, and 2-hydroxymalonyl) and transfers it to 
the ACP of that module to form a thioester. Once the PKS is primed with acyl- and 
malonyl- ACPs, the acyl group of the loading module migrates to form a thiol ester 

1 0 (trans-esterification) at the KS of the first extender module; at this stage, extender 

module 1 possesses an acyl-KS and a malonyl (or substituted malonyl) ACP. The acyl 
group derived from the loading module is then covalently attached to the alpha-carbon 
of the malonyl group to form a carbon-carbon bond, driven by concomitant 
decarboxylation, and generating a new acyl-ACP that has a backbone two carbons 

1 5 longer than the loading unit (elongation or extension). The growing polyketide chain 
is transferred from the ACP to the KS of the next module, and the process continues. 

The polyketide chain, growing by two carbons each module, is sequentially 
passed as a covalently bound thiol ester from module to module, in an assembly line- 
like process. The carbon chain produced by this process alone would possess a ketone 

20 at every other carbon atom, producing a polyketone, from which the name polyketide 
arises. Most commonly, however, additional enzymatic activities modify the beta keto 
group of each two-carbon unit just after it has been added to the growing polyketide 
chain but before it is transferred to the next module. Thus, in addition to the minimal 
module containing KS, AT, and ACP domains necessary to form the carbon-carbon 

25 bond, modules may contain a ketodreductase (KR) that reduces the keto group to an 
alcohol. Modules may also contain a KR plus a dehydratase (DH) that dehydrates the 
alcohol to a double bond. Modules may also contain a KR, a DH, and an 
enoylreductase (ER) that converts the double bond to a saturated single bond using 
the beta carbon as a methylene function. As noted above, modules may contain 

30 additional enzymatic activities as well. 

Once a polyketide chain traverses the final extender module of a PKS, it 
encounters the releasing domain or thioesterase found at the carboxyl end of most 
PKSs. Here, the polyketide is cleaved from the enzyme and cyclyzed. The resulting 
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polyketide can be modified further by tailoring or polyketide modification enzymes; 
these enzymes add carbohydrate groups or methyl groups, or make other 
modifications, i.e., oxidation or reduction, on the polyketide core molecule. 

While the above description applies generally to modular PKS enzymes, there 
5 are a number of variations that exist in nature. For example, some polyketides, such as 
epothilone, incorporate a building block that is derived from an amino acid. PKS 
enzymes for such polyketides include an activity that functions as an amino acid 
ligase or as a non-ribosomal peptide synthetase (NRPS). Another example of a 
variation, which is actually found more often than the two domain loading module 

* 

10 construct found in DEBS, occurs when the loading module of the PKS is not 

composed of an AT and an ACP but instead utilizes an inactivated KS, an AT, and an 
ACP. This inactivated KS is in most instances called KS Q , where the superscript letter 
is the abbreviation for the amino acid, glutamine, that is present instead of the active 
site cysteine required for activity. For example, the oleandolide PKS loading module 

1 5 contains a KS Q . Yet another example of a variation has been mentioned above in the 
context of modules that include a methyltransferase activity; modules can also include 

■ 

an epimerase activity. The components of a PKS are described further below in 
specific reference to the oleandolide PKS and the various recombinant and hybrid 
PKSs provided by the invention. 

20 

Section I: The Oleandolide PKS 

The oleandolide PKS was isolated and cloned by the following procedure. 
Genomic DNA was isolated from an oleandomycin producing strain of Streptomyces 
antibioticus (ATCC 1 1891), partially digested with a restriction enzyme, and cloned 

25 into a commercially available cosmid vector to produce a genomic library. This 

library was then introduced into E. coli and probed with a DNA fragment generated 
from S. antibioticus DNA using primers complementary to sequences of KS domains 
encoding extender modules 5 and 6 of the oleandolide PKS. Several colonies that 
hybridized to the probe were pooled, replated, and probed again, resulting in the 

30 identification of a set of cosmids. These latter cosmids were isolated and transformed 
into a commercially available E. coli strain. Plasmid DNA was isolated and analyzed 
by DNA sequence analysis and restriction enzyme digestion, which revealed that the 
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desired DNA had been isolated and that the entire PKS gene cluster was contained in 
overlapping segments on two of the cosmids identified. 

Further analysis of these cosmids and subclones prepared from the cosmids 
facilitated the identification of the location of various oleandolide PKS genes and 
5 ORFs, as well as the modules and domains in the PKS proteins encoded by those 
ORFs. The location of these genes and modules is shown on Figures 1 and 2. Figure 
1 shows that the complete oleandolide PKS gene cluster is contained within the insert 
DNA of cosmids pKOS055-l (insert size of -43 kb) and pKOS055-5 (insert size of 
-47 kb). Each of these cosmids has been deposited with the American Type Culture 
1 0 Collection in accordance with the terms of the Budapest Treaty (cosmid pKOS055- 1 
is available under accession no. ATCC 203798; cosmid pKOS055-5 is available under 

* 

accession no. ATCC 203799). Various additional reagents of the invention can be 
isolated from these cosmids. DNA sequence analysis was also performed on the 
various subclones of the invention, as described herein. 
1 5 Those of skill in the art will recognize that, due to the degenerate nature of the 

genetic code, a variety of DNA compounds differing in their nucleotide sequences can 

be used to encode a given amino acid sequence of the invention. The native DNA 

t 

sequence encoding the oleandolide PKS of Streptomyces antibioticus is shown herein 

-» "*» 

merely to illustrate a preferred embodiment of the invention, and the invention 
20 includes DNA compounds of any sequence that encode the amino acid sequences of 
the polypeptides and proteins of the invention. In similar fashion, a polypeptide can 
typically tolerate one or more amino acid substitutions, deletions, and insertions in its 
amino acid sequence without loss or significant loss of a desired activity. The present 
invention includes such polypeptides with alternate amino acid sequences, and the 
25 amino acid sequences encoded by the DNA sequences shown herein merely illustrate 
preferred embodiments of the invention. 

The recombinant nucleic acids, proteins, and peptides of the invention are 
many and diverse. To facilitate an understanding of the invention and the diverse 
compounds and methods provided thereby, the following description of the various 
30 regions of the oleandolide PKS and corresponding coding sequences is provided. To 
facilitate description of the invention, reference to a PKS, protein, module, or domain 
herein can also refer to DNA compounds comprising coding sequences therefor and 
vice versa. Also, unless otherwise indicated, reference to a heterologous PKS refers to 
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a PKS or DNA compounds comprising coding sequences therefor from an organism 
other than Streptomyces antibioticus. In addition, reference to a PKS or its coding 
sequence includes reference to any portion thereof. 

Thus, the invention provides DNA molecules in isolated (i.e., not pure, but 
5 existing in a preparation in an abundance and/or concentration not found in nature) 
and purified (i.e., substantially free of contaminating materials or substantially free of 
materials with which the corresponding DNA would be found in nature) form. These 
DNA molecules comprise one or more sequences that encode one or more domains 
(or fragments of such domains) of one or more modules in one or more of the ORFs 
10 of the oleandolide PKS gene cluster. Examples of such domains include the KS, AT, 
DH, KR, ER, ACP, and TE domains of at least one of the 6 extender modules and 
loading module encoded by the 3 ORFs of the oleandomycin PKS genes. 

In one embodiment, the DNA molecule comprises an ORF other than or in 
addition to the ORFB described in Swan et al. 9 supra\ which corresponds to the 
1 5 oleAIII gene ORF herein, the module is a module other than or in addition to extender 
module 5 and/or module 6 of ORFB; and the domain is a domain other than or in 
addition to a domain of module 5 and/or module 6 of ORFB or the ACP domain of 
module 4 of ORF A. In an especially preferred embodiment, the DNA molecule is a 
recombinant DNA expression vector or plasmid. Such vectors can either replicate in 
20 the cytoplasm of the host cell or integrate into the chromosomal DNA of the host cell. 
In either case, the vector can be a stable vector (i.e., the vector remains present over 
many cell divisions, even if only with selective pressure) or a transient vector (i.e., the 
vector is gradually lost by host cells with increasing numbers of cell divisions). 
The oleandolide PKS, also known as 8, 8a-deoxyoleandolide synthase, is 
25 encoded by three ORFs (oteAI, oleAII, apd oleAIII). Each ORF encodes 2 extender 
modules of the PKS; the first ORF also encodes the loading module. Each module is 
composed of at least a KS, an AT, and an ACP domain. The locations of the various 
encoding regions of these ORFs are shown in Figure 2 and described with reference to 
the sequence information below. 
30 ORF1 encodes 8, 8a-deoxyoleandolide synthase I and begins at nucleotide 

5772 and ends at nucleotide 18224 in the sequence below. ORF1 encodes a loading 
module (encoded by nucleotides 5799-8873), composed of aKS Q domain (encoded 
by nucleotides 5799-7055), a malonyl-specific AT domain (encoded by nucleotides 
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7458-8563), and an ACP domain (encoded by nucleotides 8634-8873). ORF1 also 
encodes extender module 1 (encoded by nucleotides 8955-13349), composed of a KS 
domain (KS1, encoded by nucleotides 8955-10205), an AT domain (ATI, encoded by 
nucleotides 10512-1 1549), a KR domain (KR1, encoded by nucleotides 12258- 
5 12818), and an ACP domain (ACPI, encoded by nucleotides 13092-13349), and 

extender module 2 (encoded by nucleotides 13407-17966), composed of a KS domain 
(KS2, encoded by nucleotides 13407-14690), an AT domain (AT2, encoded by 
nucleotides 14997-16031), a KR domain (KR2, encoded by nucleotides 16872- 
17423), and an ACP domain (ACP2, encoded by nucleotides 17709-17996). 

10 ORF2 encodes 8, 8a-deoxyoleandolide synthase 2 and begins at nucleotide 

18267 and ends at nucleotide 29717 in the sequence below. ORF2 encodes extender 
module 3 (encoded by nucleotides 18357-22985), composed of a KS domain (KS3, 
encoded by nucleotides 18357-19643), an AT domain (AT3, encoded by nucleotides 
19965-20999), an inactive KR domain (KR3, encoded by nucleotides 21897-22449), 

15 and an ACP domain (ACP3, encoded by nucleotides 22728-22985), and extender 
module 4 (encoded by nucleotides 23046-29396), composed of a KS domain (KS4, 
encoded by nucleotides 23046-24329), an ATdomain (AT4, encoded by nucleotides 
24645-25682), a DH domain (DH4, encoded by nucleotides 25719-26256), an ER 
domain (ER4, encoded by nucleotides 27429-28301), a KR domain (KR4, encoded by 

20 nucleotides 283 14-28862), and an ACP domain (ACP4, encoded by nucleotides 
29147-29396). 

ORF3 encodes 8, 8a-deoxyoleandolide synthase 3 and begins at nucleotide 
29787 and ends at nucleotide 40346 in the sequence below. This sequence has been 
previously reported by Swan et al. 9 supra. ORF3 encodes extender module 5 (encoded 

25 by nucleotides 29886-34478), composed of a KS domain (KS5, encoded by 

nucleotides 29886-31 184), an AT domain (ATS, encoded by nucleotides 31494- 
32531), a KR domain (KR5, encoded by nucleotides 33384-33935), and an ACP 
domain (ACP5, encoded by nucleotides 34221-34478), and extender module 6 
(encoded by nucleotides 34845-39440), composed of a KS domain (KS6, encoded by 

30 nucleotides 34845-36131), an AT domain (AT6, encoded by nucleotides 36447- 
37484), a KR domain (KR6, encoded by nucleotides 38352-38903), and an ACP 
domain (ACP6, encoded by nucleotides 39183-39440). ORF3 also encodes a TE 
domain at nucleotides 39657-40343. 
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The DNA sequence below also includes the sequences of a number of the 
tailoring enzyme genes in the oleandomycin gene cluster, including olel (nucleotides 
152-1426), oleN2 (nucleotides 1528-2637), oleR (nucleotides 2658-4967), olePl 
(nucleotides 40625-41830), oleGl (nucleotides 41878-43158), oleG2 (nucleotides 
43163-44443), oleMl (nucleotides 44433-45173), oleY (nucleotides 45251-4641 1), 
oleP (nucleotides 46491-47714), and oleB (nucleotides 47808-49517). 

The sequence of the portion of the oleandomycin gene cluster described above 
follows: 

1 GCATGCCCGC CCGCAACACC GGCTCCCGTA ACGGGGCGAG CCGGTGGTCA TCCATCAGTT 
61 TCCTTCCGCC CGGCCCGTGT CAGGCCCGTG TGCGCATACC GCCGTACGGC TGCGCCGGTC 
121 CCCCGCGGAA CACCTCACCG GAGTGAGATC CATGACGAGC GAGCACCGCT CTGCCTCCGT 
181 GACACCCCGT CACATCTCCT TCTTCAACAT CCCCGGCCAC GGCCACGTGA ACCCGTCACT 
241 CGGCATTGTC CAGGGACTTG TCGCGCGCGG CCAACGGGTC AGCTACGGCA TTACCGACGA 
301 GTTCGGCGCA CAGGTCAAGG CGGGCCGCGC GACGGCCGTT GTGTACGGCT TCATTCTGCC 
361 GGAGGAGTTC AACCCCGAGG AGTTGTTGGC CGAGGACCAG GGTTCCCGAT GGGCCTGTTC 
421 CTTGGCGGAG GCGTTCCGGG TCTTGCCfCA GCTGAGGACG GCTACGCCGA CGACCGGCCG 
481 GGACCTGATC GTCTACGACA TCGCCTCCTG GCCCGCCCCG GTGCTCGGCC GGAAGTGGGA 
541 CATCCCCTTC GTCCAGCTCT CCCCGACCTC CGTCGCCTAC GAGGGCTTCG AGGAGGACGT 
601 ACCCGCGGTG CAGGACCCCA CGGCCGACCG CGGCGAGGAG GCCGCCGCCC CCGCGGGGAC 
661 CGGGGACGCC GAGGAGGGTG CCGAGGCCGA GGACGGCCTG GTGCGCTTCT TCACCCGGCT 
721 CTCGGCCTTC CTGGAGGAGC ACGGGGTGGA CACCCCGGCC ACCGAGTTCC TCATCGCGCC 
781 CAACCGCTGC ATCGTCGGCT GCCGCGCACC TTCCCAGATC AAGGGCGACA CGGTCGGCGA 
841 CAACTACACC TTCGTCGGTC CCACCTACGG CGACCGGTCC CACCAGGGCA CCTGGGAAGG 
901 CCCCGGGCAC GGGCGTCCGG TGCTGCTGAT CGCCCTGGGC TCGGCGTTCA CCGACCACCT 
961 CGACTTCTAC CGCACCTGCC TGTCCGCtGT GGACGGCCTG GACTGGCACG TGGTGCTCTC 
1021 CGTGGGCCGC TTCGTCGACC CCGCGGACCT CGGCGAGGTC CCGCCGAACG TCGAGGTGCA 
1081 CCAGTGGGTG CCGCAGCTCG ACATCCfGAC CAAAGCCTCC GCGTTCATCA CGCACGCGGG 
1141 CATGGGCAGC ACCATGGAGG CCCTGTCGAA CGCGGTGCCC ATGGTCGCGG TGCCGCAGAT 
1201 CGCGGAGCAG ACGATGAACG CCGAGCGfeAT CGTCGAGCTG GGCCTCGGCC GGCACATCCC 
1261 GCGGGACCAG GTCACGGCCG AGAAGCTGCG CGAGGCCGTG CTCGCCGTCG CCTCCGACCC 
1321 CGGTGTCGCC GAACGGCTCG CGGCCGTCCG GCAGGAGATC CGTGAGGCGG GCGGCGCCCG 
1381 GGCGGCCGCC GACATCCTGG AGGGCATCCT CGCCGAAGCA GGCTGACCGC CCCTGCCTGA 
1441 CGGTGCGCGG GCCGCCCGGC CCGCCGCGTG AGAGTCGGCC CCCGTACCCG ACGACGGGTA 
1501 CGGGGGCCGA CGCGCGCGGG CCCGGACTCA GCAGGCGGCC ACCGCGCCCC GTACCGCCTC 
1561 GATCACCGCC TTGACGGCGT CGTCGGACAG GTGCGGGCCT ATGGGCAGGC TCAGCACCTC 
1621 CCGGGCGAGC CGCTCCGCCA CGGGCTGTGC GCGGGCGGCC TGCCGGCTGC CGGCGTACGC 
1681 CTCCGACCGG TGCACCGGCA CCGGGTAGTG GATCAGCGTC TCGACGCCGG CTGCCGCCAG 
1741 CCGCTCCCGC AGCGCGGACC GGTCCGCGGA ACGAATCACG AACAGGTGCC ACACGGGGTC 
1801 CGCCCACGGC GCCGGCCTCG GCAGCACGAT CCCGTCCAGG CCGGCGAGCC CGTCGAGATA 
1861 GCGCGCCGCC ACCGCGGCCC GGCGCTCGGG TCCCAGCCGT CCCAGGTGGG CGAGCTTGAC 
1921 CCGCAGAACG GCCGCTTGCA GCTCGTCCAG CCGGAAGTTG GTGGCCCGGA CCTCGTGCCG 
1981 GTACTTCTCC CGCGACCCGT AGTTGCGCAG CAGCCGCACC CGCTCCGCCA GCTCCGCGTC 
2041 GTCCGTCACC ACGGCGCCGC CGTCACCGAA GCCGCCCAGG TTCTTGCCCG GGTAGAAGCT 
2101 GAAGGCGGTG GTGGACCACG CGCCCACCCG CCGGCCGTAC GCCTGCGCAC CGTGCGCCTG 
2161 GGCGGCGTCC TCCAGGATCC GCACGCCGTG CCGCTCGGCG ACCTCGGACA ACGCCGCCAG 
2221 GTCCGCCGGA TGCCCGTACA GGTGCACCGG GAGGATCACC CGGGTGCGGG AGGTGATCGC 
2281 AGCCTCGACG CGCTCCGGGT CCAGGGTGAA CGTCGCAGGC TCCGGTTCCA CCGCGACGGG 
2341 CTCCGCACCC GTCGCCGAGA CGGCGAGCCA GGTCGCGGCG AAGGTGTGCG CCGGGACGAT 
2 4 01 CACCTCGTCA CCCGGCCCGA TGTCCATGGC GCGCAGCGCC AGTTCCAGGG CGTCGCACCC 
24 61 GCTGCCCACC GCCACGCAGT GCCGGGCCCC GCAGTAGGCG GCCCACTCCG TCTCGAACGC 
2521 GGCGAGTTCG GGGCCCAGGA GGTAGCGCCC GGAGTCCAGG ACGCGGCCGG TCGCGGCGTC 
2581 GATGTCGTGC TTGAGCTCCA GGTAGGCGGC CCGGAGGTCC AGGAACGGAA CGTCCATGCG 
2641 TCCTCCGTGG GAGCTGCTCA CGGCGCCGTG GCGCTGAGCG GGAGACGGCC GAGGGACGGG 
2*701 CCCACCATGA CCTGCCGTCC GGGTCCGGTC ACCCAGGTGT GGGCGCCGCT GTCCCAGTTC 
2761 TGGAGGGCCC TGCGCTCGAC GTGCAGGGTC AGCCTCCTGC TCTCGCCCGG CCGCAGCTCG 
2821 ACCTTCCCGT AGGCCGCCAG GGCACGCTTG GCCTGCGCCA CCCGCACGTG CGGGGACGGC 
2881 CCCACGTAGA CCTGCGGGAC CTCCTTGCCG GTGCGCGTAC CGGTGTTGCG CAGCGTGAAG 
2 941 CAGACGTCGA GCCCGCCGTC CGCCGTCGCC GTCACCTTCA GGTCCCGGTA GTCGAAGGAG 
3001 GTGTAGCACA ACCCGTGGCC GAAGGAGAAC AGCGGCTGGA CGCCCTGCTG TTCGTACCAG 



WO 00/26349 



- 15- 



POYUS99/24478 



3061 CGGTAGCCGG AGTAGATGCC CTCGGAGTAG 
3121 CTGGCGTCCC CGGCGAACGG CGTCTGCCCC 
3181 CCTCCTGGGT CGGCGTCGCC GAACAGCAGG 
3241 GGGTACCACA TGGTGAGCAC CGCGGCGGTC 
3301 CCCGTGTTGA GCACCACCAC GGTCCGTGGG 
3361 TCCTGGCGGC CGGGCAGGGA CAGCGACGTG 
3421 GCGAAGACGA CCGCGGTCCT CGCCGTCCGC 
34 81 TGGGCGGCCT GCGGAGTGAC CCACGTCAGC 
3541 GCGCCGGTGA TGCGCAGCTT GTGCGTTCCG 
3601 TCGCCGTAGA CCCAGGGCCG ACGGCCGAAC 
3661 TTGCCGCCCT GCGCGCGGGC CGCGATGCGG 
3721 CCGTCGTAGA GGACACCGCC CCCACCGGCG 
37 81 GGAAGAGGGG CGGACTGCGG AACGGGAACC 
3841 ACGGTGCTGC CGGCGCCGGC CCGTTCGCGG 
3901 GGCACGATGT ACGAACTGCC CAGCCCGGTC 
3961 ACGGCGATGT CCGCCGCCGT CTCCGTGGTC 
4021 AGGACCGCGC CGTCCTCGGC GACCTGGCGC 
4 081 GCCGGGCGGG CGGGCGGATC CTCGTCCAGC 
4141 CGGGTGACGG CCTCGTCGAG GGCCGACTCG 
4201 AGCGGGTCGC CGAAGAACTT GCCGCCGpGT 
4261 AGCTCGATGC CGAGTTCCTG GTCGAGCCCC 
4321 AGCCAGTCCG AGGTCACCCA GCCACGGAAC 
4381 AGTTCGTCAC TGCCGCAGGC CGGCTGGfcCG 
44 41 CCGGTTCCGG CAGCCACGGC GCTCTCGAAA 
4 501 TCGTCGACGT TCACGTTAAC GCTGAAACGA 
4561 TTGGTGGCGG CGATCAGCCC CTGACTCTGG 
4 621 GAGGTGACCA GGGGGTCCTC GCTGAACGTC 
4681 ATGGAGTTCA CCATCGGCGC GAACACCACG 
47 41 ATCACCGCCC CGTAGGACCG CGCCAGGCCG 
4801 GCGGGCAGCG CGAGGGACGG CCGGTGGATC 
4 861 ATCTCGGGTA TGCCGAGGCG GGGAACGCCC 
4 921 GTGTGATAGC TCCAGTGCAC GAACGACAGC 
4 981 AGACGAGCCG TTTCCCACGG ATCGCCCfcAT 
5041 GCGAGACCGA GGGCCAGGCC GAGAGTACCC 
5101 CGCTGCGCAC GGCCGCCGAG ACGTAACCGA 
5161 CGCCCTCGTG CTGCGAGGCG CATGAAATGG 
5221 CGAAGCCGGA GCAATGCCCG TGAATAAGGT 
5281 GATCATGCCC AGCTCAAGTG ATGGTCATGC 
5341 GTGAGCTGAT CTAGCGTTGC CGCACGASGA 
54 01 CTCAGGGGGT GAACAGACGG CAGCCCGGAC 
54 61 ACAGGACGCG GCCACCCTCC GAGGCACCCG 
5521 GCCCGTCGCA ACTCTCCGAT CGCTGCCGCC 
5581 GACCAGGAGG CGAAGCGAGG GCCGGCC6CG 
5641 CACATCCACC CCGGCGCGTG CGGTACGGGC 
5701 AAAGCAGACC CCTTGATTCG CTTCCATGGT 
57 61 GGTGGGAAAC CATGCATGTC CCCGGCGAGG 
5821 TTGCGTGCCG ACTGCCGGGC TCTGCCACCC 
5881 CCGCAGACGC ATTGGACGAG CCCCCCGCCG 
5941 CCCCCGCTCC GCGCGGCGGA TTCCTCGACA 
6001 ACATCTCGCC CAGAGAAGCC GGTGTCCTCG 
6061 GCTGGGAGGC GCTGGAAGAC GCCGGAATCG 
6121 CGGTCTTCAT GGGCGCCATG TGGGACGACT 
6181 CCGCCCTCAC CCGGCATTCC CTGACGGGAA 
62 41 CCTACGCCCT GGGCCTCCAA GGCCCCAGCC 
6301 TCGCCGCCGT GCACATGGCC TGCGAGAGCC 
6361 TCGGCGGCGT CAACCTCGTC CTCGATCCGG 
64 21 CACTCTCACC GGACGGCAGG TGCTACACCT 
64 81 GAGAGGGCGG CGTCGTAGTC GTCCTCAAGC 
6541 CCGTCTACTG CGAGATCCTG GGCAGCGCCC 
6601 CCGTCCCCAG CGCCCGCGCC CAGGCGGACG 
6661 TGGCCCCGAC GGACGTCCAG TACGTGGAAC 
6721 CCGTCGAGGC CGAGGGCCTC GGCACCGCGC 
67 81 TCCTGGTCGG CTCGGTCAAG ACGAACATCG 
6841 GCCTCCTGAA GACGGTCCTG AGCATCAAGA 
6901 CCTCGCCCAA CCCCCGCATC GACCTCGACG 
6961 GCCCCTGGCC GAGCCCCGAC CGGCCGCTGG 
7021 GGACGAACTG CCACGTCGTC CTGTCCGAGT 



TCCAGTTGGT CATCGACTCC CGGGTAGCGC 
TCGTCGGCCG GGAAGGTCTG GGTCAGCCGG 
GCGGTGGTCG CCTCGGCGCC GGCCTGGCCC 
TTCCTCAGCC AGGGCATGGT GAGGGAGGAG 
TTGACCGCGG CCACGGCGCT GATCAGGTCG 
CGGTCCCCGT CCTCCGAGCC GTCGTCGTAC 
GCGATCGACA CGGCCCGGTC GATCGCCTCC 
TCGAAGGTCA TGGGCGACTT CGCCAGGGCC 
GCCGCCAGCC GCATGGGGCG GCTGCTGACG 
GGCTCCTGGC CGTCGAGTTC GACGTAGGCG 
TAGCTGCCGG TGACCGGCAC GGTGATGGTG 
GGGAACACCT CGCCCGAGGG GCGGGGCCGC 
CCGACCGTCT CCTCACCGGT GCTGTAGCGC 
ATGGTGTCCA GAGGGGCGGA CGCGCCGTCC 
ACCTTCGGGA CCTTGGCGGT GGGGCCGATC 
AGGGGAAGGG TGGCGCCCTC GTTGCGCAGC 
GCGACCTTCA AGCCGCCCGC GAGGTCGCGC 
AGCCGGAACC GGGCCATCTG CGACACGATG 
GGGATGCGTC CCTCCCGGAT CGCCGTCTTG 
ATCGGCTCGC CCGGGGCGGG TTCGTGGTCC 
TTGGTGAGGG CGTCCGTGCT CTGCGTCGCC 
TTCCACTGCT CCTTGAGGAC CTTGTTCAGC 
TTGACCTTGT TGTAGGCGCA CATCACCGAG 
CCGGGCAGTT CCCGCTCGCG CAACGTCTGT 
ttct'tctcct GGTTGTTCGC CGCGTAGTGC 
ATGCCCTTGA TCTCCGCGGC GGCCATCCGC 
TCGAAGTTCC GCCCGGCGTA CGGCACGCGT 
TCCTGCCCGA AGGCGCGCCC CTCCCGGCCG 
TCGTCGAAGG TGGAGGCCAG CGCCACGGGA 
GTGATTCCGG CGGGACCGTC GGTGGCCCGC 
GGCAGGTACA CCTTTGCCGA CTCATCGCTC 
TTTTCTTCCA GGGTCATCCG AGCCGTCAGA 
TCGGCGACGG ACGGAACAGA GGGGAGCAGG 
GCGGAGGTCC GTGGCGGGAC CGGACTCCTG 
AGTGATCTCA AAAGGCTTCC AAATCCTCCG 
GCGGTTGTCG CGACCACAGT GCACCGTCAC 
CGCGCCCTTC CGTGGATGAT CTCCGCACGA 
ACGTACCAAG AAGGGGCTTG CCTGGGGGGC 
CGAGTCGTGA GCGAGGCGAA CGCTCTGCCG 
GTTCGACGAG GGTCAAGCGG AACGCAGGCG 
TGCCGACCAT CCTCGCAGGT CCTTCGCCAT 
GATGGCGACA GCCCGGCACC GAGGCCCCTG 
ATGCACGAAT CGGACCCAGG CGAACACCGG 
CGCGCCCGAT GACGGGCGAA CGACGACCGA 
TGTGGCAGCC GCGGGGAGCG TCGGCAGAGA 
AAAACGGGCA TTCCATTGCC ATTGTCGGAA 
CCCAGGAGTT CTGGAGACTC CTGGCCGACT 
GCCGTTTCCC GACCGGCTCA TTATCCTCGC 
GCATCGACAC TTTCGACGCG GATTTCTTCA 
ACCCCCAGCA ACGCCTCGCG CTGGAACTCG 
TCCCGCGACA CCTCAGGGGA ACCCGCACCT 
ACGCGCACCT GGCGCACGCA CGGGGAGAAG 
CGCACCGCGG CATGATCGCC AACCGGCTCT 
TCACCGTCGA CACCGGACAA TCCTCCTCCC 
TGGCCCGCGG CGAATCCGAC CTCGCCCTCG 
CCGGCACGAC CGGCGTCGAG AGGTTCGGAG 
TCGACTCCCG GGCGAACGGC TACGCCCGAG 
CCACCCACCG CGCGCTCGCG GACGGTGACA 
TCAACAACGA CGGCGCCACG GAAGGCCTCA 
TCCTGCGACA GGCATGGGAA CGGGCACGCG 
TGCACGGAAC CGGCACACCG GCCGGCGACC 
TCGGCACCGC ACGCCCGGCC GAGGCGCCGC 
GTCACCTCGA AGGCGCGGCA GGCATCGCGG 
ACCGGCACCT CCCGGCAAGC CTGAACTTCA 
CCCTGCGCCT GCGCGTCCAC ACCGCGTACG 
TGGCGGGCGT CTCCTCCTTC GGCATGGGCG 
TACGGAACGC GGGAGGCGAC GGCGCCGGAA 
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7081 AAGGGCCGTA CACCGGCACG GAAGACCGGC TCGGCGCCAC GGAGGCGGAG AAGAGGCCGG 
7141 ACCCGGCAAC CGGAAACGGT CCTGATCCCG CCCAGGACAC CCACCGCTAC CCGCCGCTGA 
7201 TCCTGTCCGC CCGCAGCGAC GCGGCCCTGC GCGCACAGGC GGAACGGCTC CGCCACCACC 
72 61 TGGAACACAG CCCCGGACAG CGCCTGCGGG ACACCGCCTA CAGCCTGGCG ACCCGCCGCC 
7321 AGGTCTTCGA GCGGCACGCG GTGGTCACCG GACACGACCG CGAGGACCTG CTCAACGGCC 
7381 TGCGTGACCT GGAGAACGGC CTCCCGGCCC CCCAGGTCCT GCTCGGCCGC ACGCCCACCC 
74 41 CCGAACCGGG CGGCCTCGCC TTCCTCTTCT CCGGGCAGGG CAGCCAGCAG CCCGGCATGG 
7501 GCAAGCGACT CCACCAGGTG TTCCCCGGCT TCCGGGACGC CCTGGACGAG GTCTGCGCCG 
7561 AACTCGACAC CCACCTCGGC CGACTCCTCG GCCCCGAGGC CGGCCCGCCC CTGCGCGACG 
7 621 TGATGTTCGC CGAGCGGGGC ACGGCGCACA GCGCCCTGCT CTCCGAGACC CACTACACCC 
7 681 AGGCCGCCCT CTTCGCCCTG GAAACCGCCC TCTTCCGCCT CCTGGTCCAG TGGGGCCTGA 
77 41 AACCCGACCA CCTCGCAGGC CACTCCGTCG GCGAGATCGC GGCCGCCCAC GCAGCAGGCA 
7801 TCCTCGACCT GTCCGACGCG GCCGAACTCG TGGCCACCCG CGGCGCGTTG ATGCGTTCCC 
7 861 TGCCCGGCGG CGGCGTCATG CTCTCGGTCC AGGCACCCGA GTCCGAGGTC GCACCCCTGC 
7 921 TGCTCGGCCG TGAGGCCCAC GTCGGCCTGG CCGCCGTGAA CGGCCCCGAC GCGGTGGTCG 
7 981 TGTCCGGCGA GCGCGGCCAC GTCGCCGCCA TCGAACAGAT CCTCCGGGAC AGGGGCCGCA 
8041 AAAGCCGGTA CCTGCGCGTC AGCCACGCCT TCCACTCCCC GCTCATGGAA CCGGTGCTGG 
8101 AGGAGTTCGC CGAAGCCGTC GCCGGCCTGA CCTTCCGGGC ACCGACCACA CCCCTCGTCT 
8161 CCAACCTCAC CGGCGCACCA GTCGACGACC GGACCATGGC CACGCCCGCC TACTGGGTCC 
8221 GGCACGTCCG GGAAGCGGTC CGCTTCGGCG ACGGCATCCG GGCACTCGGG AAACTGGGCA 
8281 CCGGCAGCTT CCTGGAAGTC GGGCCGGACG GCGTCCTCAC CGCCATGGCG CGCGCATGCG 
8341 TCACCGCCGC CCCGGAGCCC GGCCACCGCG GCGAACAGGG CGCCGATGCC GACGCCCACA 
8401 CCGCGTTGCT GCTGCCCGCC CTGCGCCGAG GACGdiGACGA GGCGCGATCG CTCACCGAGG 
84 61 CCGTGGCACG GCTCCACCTG CACGGCGTGC CGATGGACTG G^CCTCCGTC CTCGGCGGCG 
8521 ACGTGAGCCG GGTCCCCCTC CCGACGTACG CCTTCCAACG CC&ATCCCAC TGGCTGCCGT 
8581 CCGGAGAGGC TCACCCGCGA CCGGCGGACG ACACCGAATC CGGCACGGGA CGGACCGAGG 
8641 CGTCCCCGCC GCGGCCGCAC GACGTCCTGC ACCTCGTGCG CTCCCACGCG GCGGCTGTGC 
8701 TCGGACATTC CCGGGCCGAG CGGATCGACC CCGACCGCGC GTTCCGCGAC CTCGGCTTCG 
87 61 ACTCGCTGAC GGCGCTGGAA CTGCGGGACC GGCTCGACAC CGCACTCGGC CTCCGCCTGC 
8821 CCAGCAGCGT GCTCTTCGAC CACCCGAGCC CCGGCGCACT GGCACGCTTC CTCCAGGGCG 
8881 ACGACACGAG GCGCCCCGAA CCAGGGAA5A CGAACGGCAC GCGCGCCACG GAGCCAGGCC 
8941 CGGACCCGGA CGACGAGCCG ATCGCCATCG TCGGCATGGC GTGCCGCTTC CCGGGTGGCG 
9001 TGACCTCTCC GGAGGACCTG TGGCGCCTGC TCGCCjGCAGG CGAGGACGCG GTGTCCGGCT 
9061 TCCCCACGGA CCGGGGCTGG AACGTCACTG ACTCCGCCAC GCGCCGCGGA GGCTTCCTGT 
9121 ACGACGCCGG CGAGTTCGAT GCCGCCTTCT TCGGTATCTC GCjCGCGTGAG GCGTTGGTGA 
9181 TGGACCCGCA GCAGCGGTTG CTGCTGGAGA CGTCCTGGGA GGCCCTCGAA CGCGCGGGCG 
9241 TGAGCCCCGG CAGTCTGCGC GGCAGCGAGA CGGCCGTGTA CATCGGAGCC ACAGCGCAGG 
9301 ACTACGGCCC CCGACTGCAC GAGTCGGACG ACGACTCGGG CGGCTACGTC CTGACCGGCA 
9361 ATACCGCCAG CGTGGCCTCC GGCCGCATCG CCTACTCCCT CGGTCTGGAG GGGCCTGCGG 
9421 TCACGGTGGA CACGGCGTGT TCGTCGTCGC TGGTGGCACT GCACCTGGCG GTGCAGGCGC 
9481 TGCGCCGTGG CGAGTGCTCA CTGGCATTGG CCGGCGGAGC CACGGTGATG CCTTCGCCCG 
9541 GCATGTTCGT GGAGTTCTCA CGGCAAGGGG GCCTCTCCGA GGACGGCCGC TGCAAGGCGT 
9601 TCGCCGCGAC GGCGGACGGC ACCGGCTGGG CCGAGGGTGT GGGTGTGTTG TTGGTGGAGC 
9661 GGTTGTCGGA TGCGCGGCGG TTGGGTCATC GGGTGTTGGC GGTGGTGCGG GGGAGTGCGG 
9721 TCAATCAGGA TGGTGCGTCG AATGGGTTGA CGGCGCCGAA TGGTCCGTCG CAGCAGCGGG 
9781 TGATCCGTGC GGCGTTGGCT GACGCGGGTC TGGTTCCTGC TGATGTGGAT GTGGTGGAGG 
9841 CGCATGGTAC GGGGACGCGG TTGGGTGATC CGATCGAGGC TCAGGCGTTG TTGGCGACGT 
9901 ATGGGCAGGG GCGTGCGGGT GGGCGTCCGG TGGTGTTGGG GTCGGTGAAG TCGAACATCG 
9961 GTCATACGCA GGCGGCGGCT GGTGTGGCTG GTGTGATGAA GATGGTGCTG GCGCTGGGGC 
10021 GGGGTGTGGT GCCGAAGACG TTGCATGTGG ATGAGCCGTC TGCGCATGTG GACTGGTCGG 
10081 CTGGTGAGGT GGAGTTGGCG GTTGAGGCGG TGCCGTGGTC GCGGGGTGGG CGGGTGCGGC 
10141 GGGCTGGTGT GTCGTCGTTC GGGATCAGTG GCACGAATGC GCATGTGATC GTGGAGGAGG 
10201 CGCCTGCGGA GCCGGAGCCG GAGCCGGAGC GGGGTCCGGG CTCTGTTGTG GGTGTGGTGC 
10261 CGTGGGTGGT GTCCGGGCGG GATGCGGGGG CGTTGCGTGA GCAGGCGGCA CGCTTGGCTG 
10321 CGCACGTGTC GGGTGTAAGT GCGGTCGATG TGGGCTGGTC GTTGGTGGCC ACGAGGTCGG 
10381 TGTTCGAGCA CCGGGCGGTG ATGGTCGGCA GTGAACTCGA TGCCATGGCG GAGTCGTTGG 
10441 CCGGCTTCGC TGCGGGTGGG GTTGTGCCGG GGGTGGTGTC GGGTGTGGCT CCGGCTGAGG 
10501 GTCGTCGTGT GGTGTTCGTC TTTCCTGGTC AGGGTTCGCA GTGGGTGGGG ATGGCGGCTG 
10561 GGTTGCTGGA TGCGTGCCCG GTGTTCGCGG AGGCGGTGGC GGAGTGCGCT GCGGTGCTGG 
10621 ACCCGTTGAC CGGTTGGTCG CTGGTCGAGG TGTTGCGCGG TGGTGGTGAG GCTGTTCTTG 
10681 GGCGGGTTGA TGTGGTGCAG CCGGCGTTGT GGGCGGTGAT GGTGTCACTG GCCCGGACCT 
10741 GGCGGTATTA CGGTGTGGAG CCTGCTGCGG TTGTGGGGCA TTCGCAGGGT GAGATTGCTG 
10801 CGGCTTGTGT GGCTGGGGGG TTGAGTCTGG CCGATGGTGC GCGGGTGGTG GTGTTGCGGA 
10861 GCCGGGCGAT CGCCCGGATC GCTGGTGGGG GCGGCATGGT CTCCGTCAGC CTGCCGGCCG 
10921 GCCGTGTCCG CACCATGCTG GAGGAGTTCG ACGGCAGGGT TTCCGTTGCG GCGGTCAACG 
10981 GTCCGTCCTC GACCGTGGTG TCGGGTGACG TCCAGGCCCT GGATGAGTTG TTGGCCGGTT 
11041 GTGAGCGGGA GGGTGTCCGG GCTCGTCGTG TCCCGGTGGA CTATGCCTCC CACTCCGCGC 
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11101 AGATGGACCA GTTACGCGAT GATCTGCTGG AAGCGCTGGC 

11161 CGAACGTACC GTTCTTCTCG ACGGTGACGG CGGACTGGCT 

11221 CGGGGTACTG GTTCACGAAT CTGCGGGAGA CGGTCCGGTT 

11281 TCGTGGCTCA GGGGATGGGC GCGTTCGTCG AGTGCAGCCC 

11341 GCATCACAGA AACACTCGAC ACCTTCGACG CCGACGCTGT 

11401 GTGACGAAGG CGGCCTGGAT CGGTTCCTCA CGTCCCTCGC 

114 61 TCCCGGTCGA CTGGTCCCGC GCCTTCGAGG GTGCGAGCCC 

11521 CCTACCCCTT CCAACGGCAA CGCTACTGGC TGCTCGACAA 

11581 AGCGGCTGGA GGACTGGCGC TACCACGTCG AGTGGCGCCC 

11641 CACGGCTGTC CGGTGTCTGG GCCGTGGCGA TTCCGGCACG 

117 01 TGGTCGGCGC CATCGACGCA CTGGAGCGAG GCGGCGCCCG 

11761 ATGAGCGGGA CCACGACCGG CAAGCGCTGG TCGAGGCTCT 

11821 ACGACCTCGC CGGTGTGCTC TCCCTTTTGG CCCTCGACGA 

11881 CCGACGTGCC CGTCGGCATG GCCGCTTCGC TGGCGCTCGT 

11941 CGGCCGAGGT GCCCGTATGG TTCGCGACCC GAGGCGCCGT 

12001 CACCGGAGCG ACCCAGGCAG GCGCTGCTCT GGGGACTGGG 

12061 AGCCGCAGAT ATGGGGCGGG TTGGTCGACC TCCCGCAACA 

12121 GACGGCTGGT CGATGTCGTG GGCGGCCTGG CGGACGAGGA 

12181 CCTCCGTCCT CGCCCGACGC CTCGTTCGTA CGCCGGGTCA 

12241 GCGGGCGCGA GTGGTCGCCC AGCGGCACGG TCCTGGTGAC 

12301 GCGCGCACGT CGCCCGCTGG CTGGCCGGCA AGGGCGCCGA 

12361 GTCGCGGAGC GGACGCAGCC GGGGCCGCTG CCCTTCGGGA 

124 21 TCCGGGTGAC CCTGGCCGCG TGCGATGCAG CGGACCGGCA 

12481 ACTCGCTGCG CACGGATCCG GCGCAGCTGA CGGCCGTCAT 

12541 ACGACGGCAT GACGACGGTG CTCACACCGG AGCAGATGAA 

12601 TCACGGCCAC CGTCAACCTG CACGAACTGA CCCGGGACCT 

12661 TGTTCTCGTC CATCTCCGCC ACCCTGGGAA TCCCCGGGCA 

12721 ACTCGTTCTT GGACGCCTTC GCGGAATGGC GCAGGGCTCA 

12781 TCGCCTGGGG ACCGTGGTCC GGCGGCACCG GCATGGCACA 

12841 GGCTCCAGCG GCACGGTGTA CTCGCCATGG AACCCGCGGC 

12901 ACACGCTGGC GAGCGACGAA ACCGCAGTGG CCGTGGCCGA 

12961 TCCTGGCGTA CACAGCACTG CGGGCACGGC CCTTGATCGG 

13021 GCATGCTGGA GTCCGGCTCA GGCCCC<3fcCG ACCTCGAGCC 

13081 TTGCCGTGCG TCTCGCGGGC CTCACCGCGG TCGAGCAGGA 

13141 TGAGGGAGCA GGCCGCCGTC GTCCTCGGAC ATTCCGGCGC 

13201 GAGCGTTCAA GGATCTCGGA TTCGACTCGC TGACCTCGGT 

13261 ACACCGCCAC CGGCCTCAGA CTGCCCGfGA CGGCCGTCTT 

13321 CGCTGGCCGG CCATCTGCGC TCCAGGCTGA TCGACGACGA 

13381 CCGGCGTGGA GAAGCACGCG ATCGACGAGC CGATCGCGAT 

13441 TCCCGGGAGG CATCGCTTCC CCGGAGGATC TGTGGGACGT 

13501 TTGTCTCCGG ACTGCCGCAG AACCGCGGGT GGGACTTGGG 

13561 CGGACCGGGC CGGTACGTCA TACATGCGTG AGGGTGCTTT 

13621 TCGACGCGGC CTTCTTCGGT ATCTCGCCGC GTGAGGCGTT 

13681 GGTTGCTGCT GGAGACGTCC TGGGAGGCCC TCGAACGGGC 

13741 TGGCGGGCAG TCCGACCGGT GTGTTCTTCG GCATGTCGAA 

13801 CGGGCGACGT GCCGTCCGAG CTGGAGGGCT ACCTGCTCAC 

13861 CTTCGGGGCG TGTTGCTTAC ACGTTCGGTC TTGAGGGGCC 

13921 CGTGTTCGTC GTCGTTGGTG GCGTTGCATC TGGCGGTGCA 

13981 GTTCGCTTGC GTTGGTGGGT GGGGTGACGG TGATGTCGTC 

14041 TCAGTCGGCA GCGGGGTTTG TCGQTGGATG GGCGGTGCAA 

14101 ATGGTTTTGG TGCTGCCGAG GGTGTGGGTG TGTTGTTGGT 

14161 GGCGGTTGGG TCATCGGGTG TTGGCGGTGG TGCGGGGGAG 

14221 CGTCCAATGG TCTGGCGGCG CCGAATGGTC CGTCGCAGCA 

14281 TGGCTGACGC GGGTCTGGCT CCTGCCGATG TGGATGTGGT 

14341 CGCGGTTGGG TGATCCGATC GAGGCTCAGG CGTTGCTGGC 

14401 CCAGTGGGCG TCCGGTGTGG CTGGGGTCGG TGAAGTCGAA 

144 61 CGGCCGGTGT GGCTGGTGTG ATGAAGATGG TGCTGGCGTT 

14521 AGACGTTGCA TGTGGATGAG CCGTCACCGC ATGTGGACTG 

14581 TGGCGGTTGA GGCGGTGCCG TGGTCGCGGG GTGGGCGGGT 

14 641 CGTTCGGGAT CAGCGGCACG AATGCGCATG TGATCGTGGA 

147 01 CGGTGGAGGA GGGTCCGGGC TCCGTTGTGG GTGTGGTGCC 

14761 ATGCGGGGGC GTTGCGTGCA CAGGCGGCAC GCTTGGCTGC 

14821 CGGGTGTGGT TGATGTGGGC TGGTCGTTGG TGGCCACGAG 

14881 CGGTAATGGT CGGCACTGAT CTTGATTCCA TGGCGGGGTC 

14 941 GTGGTGTTGT GCCGGGGGTG GTGTCGGGTG TGGCTCCGGC 

15001 TCGTCTTTCC TGGTCAGGGT TCGCAGTGGG TGGGGATGGC 

15061 GTCCGGTGTT CGCGGAGGCG GTGGCGGAGT GTGCCGCGGT 



GACGATCGTC CCTACATCGG 
GGACACGACC GCTCTGGATG 
CCAAGAAGCC GTCGAAGGGC 
GCACCCCGTC CTCGTCCCGG 
CGCACTGTCG TCGCTGCGGC 
GGAAGCCTTC GTCCAGGGCG 
CCGCACCGTC GACCTGCCCA 
GGCGGCGCAA CGGGAACGCG 
CGTCACGACA CGACCTTCCG 
TCTGGCCCGT GACTCACTGT 
TGCCGTGCCC GTGGTGGTCG 
GCGGAACGGG CTGGGCGACG 
AGCCCCGCAC GGTGACCACC 
GCAGGCGATG GCCGACGCCG 
AGCGGCACTG CCCGGTGAGT 
ACGGGTCGTC GCCCTGGAAC 
CCTGGACGAG GACGCGGGCC 
CCAGCTTGCC GTACGGGCCT 
CCGTATGTCG AGCCAGGCGG 
CGGAGGCACC GGGGCGCTGG 
GCACCTGGTA CTCATCAGCC 
CAGCCTCACG GACATGGGTG 
CGCACTGGAG ACGCTCCTCG 
CCACGCCGCG GGTGCTCTGG 
CAACGCCCTG CGAGCGAAAG 
CGACCTCTCG GCCTTCGTAC 
GGCCAACTAC GCGCCGGGAA 
GGGGCTCGTG GCGACCTCCA 
TGAAGGGTCG GTGGGCGAAC 
GGCCATCGCT GCGCTCGACC 
CATCGACTGG AGCCGGTTCT 
AGAGATACCC GAGGCACGCC 
GGACCGTGCC GAACCCGAGC 
ACGTCTTCTG GTGCAGCTCG 
CGAGGCGGTG GCTCCGGACC 
CGAACTGCGC AACCGGCTGA 
CGACTACGCG AGGCCCGCGG 
TGGTGACCAC GGTGCCTTGC 
CGTGGGAATG GCATGCCGCT 
GCTCACCGCT GGTGAGGACG 
GCGCCTGTAC GATCCCGATC 
CCTGCACGAG GCGGGGGAGT 
GGCGATGGAC CCGCAGCAGC 
CGGCATCACT CCTTCCAAGC 
CCAGGACTAC GCCGCCCAGG 
CGGCTCCATC TCCAGCGTCG 
TGCGGTGACG GTGGATACGG 
GGGGTTGCGG CGGGGTGAGT 
GCCGGTGACG TTGACGACGT 
GGCGTTCGCG GCTTCGGCGG 
GGAGCGGTTG TCGGATGCGC 
TGCGGTCAAT CAGGATGGTG 
GCGGGTGATC CGTGCGGCGT 
GGAGGCGCAT GGCACGGGGA 
GACGTATGGG CAGGGTCGTA 
CATCGGGCAT ACGCAGGCGG 
GGGTCGGGGT GTGGTGCCGA 
GTCGGCTGGT GAGGTGGAGT 
GCGGCGGGCT GGTGTGTCGT 
GGAGGCGCCT GCGGAGCCTT 
GTGGGTGGTG TCCGGGCGGG 
GCACGTGTCG AGCACGGGTG 
GTCGGTGTTC GAGCACCGGG 
GTTGGCCGGC TTCGCTGCGG 
TGAGGGCCGT CGTGTGGTGT 
GGCTGGGTTG CTGGATGCGT 
GCTGGACCGG TTGACCGGTT 
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15121 

15181 

15241 

15301 

15361 

15421 

15481 

15541 

15601 

15661 

15721 

15781 

15841 

15901 

15961 

16021 

16081 

16141 

16201 

16261 

16321 

16381 

16441 

16501 

16561 

16621 

16681 

16741 

16801 

16861 

16921 

16981 

17041 

17101 

17161 

17221 

17281 

17341 

17401 

17461 

17521 

17581 

17641 

17701 

17761 

17821 . 

17881 

17941 

18001 

18061 

18121 

18181 

18241 

18301 

18361 

18421 

18481 

18541 

18601 

18661 

18721 

18781 

18841 

18901 

18961 

19021 

19081 



GGTCGCTGGT 

AGCCGGCGTT 

AGCCTGCTGC 

GGTTGAGTCT 

TCGCTGGTGG 

TCGACACCTA 

TGTCCGGTGA 

GGGCTCGTCG 

ATGAGTTGCT 

CGACGGTGAC 

ATCTGCGGGA 

GCGCGTTCGT 

ACACCTTCGA 

ATCGGTTCCT 

ATGCCTTCGA 

AGCGCTACTG 

AGTTCTGGTC 

ACGCCGAGGC 

AGCATCGACG 

TTGATGAGGT 

GTGTGGTTGC 

AGCTCGATCC 

TGAGCGGGGT 

TTCCCGCCGG 

TTGGTGAGGG 

ATGCCGGTGT 

GTCTGGAGCA 

AGGTGTGCCG 

TGCGTGGTTC 

GTGGTTGGCG 

ATACGGCCCG 

GTGGCAGTGC 

GGGTGTCGGT 

ATCTGGGTGA 

TGGCGGAGAT 

TGAATCTGGG 

ATGCCGGTGT 

ATGCGTTGGC 

TGTGGGCTGG 

GGGTGCGGGC 

GTGAGGCGTT 

CTGCCCGTCC 

AGGTCCAGGG 

TGAAGCGGTT 

GTGCTCAGGC 

CGTTCAAGGA 

CGGCCACCGG 

TCGCACGCTT 

CGGCGATCGA 

CGGAAATCAC 

GAGGCAGGGA 

ATGAGCTATT 

TACCGCGACC 

AGCGCGCGAC 

CCATCGCGAT 

TGTGGGAACT 

GGGACCTGGA 

ACGGCGGTTT 

GTGAGGCGTT 

TCGAGCGCGC 

GCGCCGCACA 

ACCTGCTGAC 

TTGAGGGGCC 

TGGCGGTGCA 

TCATGTCGGA 

GCCGGTCCAA 

TGCTCTTGCT 



CGAGGTGTTG 

GTGGGCGGTG 

GGTTGTGGGG 

GGCCGATGGT 

GGGCGGCATG 

CGGCGGCAGG 

CGCCCAGGCC 

TGTCCCGGTG 

GGAGGCGCTG 

GGCGGACTGG 

GACGGTCCGG 

CGAGTGCAGC 

CGCCGACGCT 

CACGTCCCTC 

GGGTGGACGC 

GCTGCACGAA 

TGTGGTCGAA 

TTTGCACACG 

GCTTCAGGAC 

GCTCGGTGGT 

GCGGGTGGTG 

GACCCGTCCT 

CGTGTCGTTC 

TCTTGCCGCG 

GCCGCGGTTG 

GGTGATTGAT 

TCCCGAGTTG 

GCGGTTCGTG 

GGGTGTGTGG 

GCCGCGTGGG 

GTGGTTGGTG 

GCCTGGTGCT 

GCGGGCCTGT 

GCCGGTGACG 

CTCTGTCCAG 

TGAGTTGGTG 

GTGGGGCAGT 

GGTGCGTCGT 

TGAGGGGATG 

GATGGATCCC 

CGTCGCGGTC 

CCGTCCGTTG 

CCGGGGCCAG 

GTCGGGGTTG 

TGCCGTTGTT 

GTTGGGTTTT 

GATCCGGCTG 

CCTGCAATCC 

CCAGTTGGAG 

GAAGCGATTG 

AGCAGGACAA 

CGAGGTGCTC 

TTGACCGGAG 

CGTGGACCTG 

CACGTCGATG 

CCTGTCCGCC 

CGAGATCTAC 

CCTGGATCAT 

GGCGATGGAC 

AGGAGTCGAT 

CATGGGTTAT 

AGGGAACGCC 

TGCGGTGACG 

GGCGCTGCGC 

CCCGAAGGTC 

GGCTTTTGCG 

GGAGCGGTTG 



CGTGGTGGTG 

ATGGTGTCAC 

CATTCGCAGG 

GCGCGGGTGG 

GTCTCGGTCG 

GTTTCCGTCG 

CTGGATGAGT 

GACTATGCCT 

GCGGACGTCA 

CTGGACACGA 

TTCCAGGAAG 

CCGCACCCTG 

GTCGCACTGT 

GCGGAAGCCT 

CCGCGCTTCG 

GAGCCGCTGC 

CGCGGCGATG 

GTGTTGCCGG 

TGGCGTTACC 

GGCTGGTTGT 

GCTGCCGTCA 

GACCGCCGGG 

TTGTCCTGGG 

TCGCTGGTGT 

TGGCTGGTGA 

CCGGTGCAGG 

TGGGGTGGGC 

GGTGTTGTGG 

GTGCGTCGTC 

ACGGTGTTGG 

GGTGGTGGGG 

GGGGATCTGG 

GATGTGGCTG 

GCGGTGTTCC 

GAGGCGGCTG 

GATCCCTGTG 

GGGGGGCAGG 

CGGGGTGTTG 

GCGTCGGTGG 

GAGCGTGCTG 

GCTGATGTGG 

ATCAGTGACC 

GGGTTGGGCT 

TCTCGTGTGC 

CTCGGGCATG 

GATTCCCTCA 

CCGGCCACCA 

GAACTCGTGG 

ACCGGTCTGG 

AACATTCTTC 

GACGCAGGCG 

GACAACGAAC 

AACGCTGTGA 

CGCAAGGCCC 

GCCTGCCACT 

GGAGGCGAGG 

CATCCTGACC 

GCGACGCAGT 

CCGCAGCAGC 

CCCCATACGC 

GCGGACAGGG 

TCGGCCGTTG 

GTGGACACGG 

CGTGGCGAGT 

TTCGTCGAGT 

GCGTCAGCGG 

TCGGATGCGC 



AGGCTGTTCT 

TGGCTCGGAC 

GTGAGATTGC 

TGGTGTTGCG 

GTCTTTCAGC 

CGGCGGTCAA 

TGTTGGCCGG 

CCCACTCCGC 

CTCCGCAGGA 

CCGCTCTGGA 

CCGTTGAAGG 

TCCTCGTCCC 

CGTCGCTGCG 

TCGTCCAAGG 

TCGACCTGCC 

AAGAGCCGGT 

CCACAGCCGT 

CTTTGTCGTC 

GGGTGGAGTG 

TCGTGGTGCC 

CGGCGCGGGG 

CTTATGCGGA 

ATGATCGGCG 

TGGCGCAGGC 

CGCGGGGTGC 

CGCAGGTGTG 

TGGTGGACCT 

CGTCGGCTGG 

TGGTGCGTGC 

TCACGGGTGG 

CGGATCATGT 

TGCGGGAGCT 

ATCGTGTGGC 

ATGCGGCTGG 

ATGTGATGGC 

GTCTGGAGGC 

CGGTGTATGC 

GTCTGCCGGC 

GTGGTGCGGC 

TGGCGGTGAT 

ACTGGGAACG 

TGCCGGAGGT 

TGGTCGGTGA 

GGCAGGAGGA 

GTTCCGCGCA 

CTGCTGTCGA 

TGGCATTCGA 

GAAGTGACGA 

CTCTGCTGGA 

TGCCCCGCTT 

AACATCAGGA 

TCGGCAATTC 

CGAACGACGA 

GGCACCGCAT 

TCCCGGGCGG 

TGCTTTCCGA 

CGGAACACAG 

TCGACACGGA 

GGTTGCTGCT 

TGAAGGGAAG 

TGGACACTCC 

TCTCCGGGCG 

CGTGCTCGTC 

GCTCGCTGGC 

TCAGCCGGCA 

ATGGTTTCGG 

GGCGGTTGGG 



TGGGCGGGTT 

CTGGCGGTAT 

TGCGGCTTGT 

GAGTCGGGCG 

TGAGCGTGTC 

TGGCCCGTCC 

TTGTGAGCGG 

GCAGATGGAC 

CTCCAGTGTT 

TGCGGGGTAC 

GCTTGTGGCT 

GGGCATCACA 

GCGTGACGAA 

CGTTCCCGTC 

CACCTATGCC 

CGATGAGGCG 

GTCCGACTTG 

GTGGCGGCGG 

GAAGCCTTTC 

GCGGGGCTTG 

TGGCGAGGTC 

GGCTGTCGCG 

GCACTCGGAG 

GTTGGTTGAT 

GGTGGTTGCT 

GGGTTTCGGG 

GCCGGTGGGG 

TTTTGAGGAT 

TGTGGTGGAT 

TCTTGGTGGT 

GGTTCTTGTG 

GGAGGGGTTG 

GTTGCGGGCG 

TGTTCCTCAG 

GGCCAAGGTG 

GTTTGTGTTG 

GGCGGCGAAT 

CACGAGTGTG 

GCGGGAGTTG 

GGCTGATGCG 

TTTCGTCACC 

GCGTGCTGTT 

GGAGGAGTCG 

GGAGTTGGTG 

GGACGTCCCG 

GCTACGCAAC 

TCATCCCACC 

CCCGCTGACG 

ATCGGACGAA 

CGGAAGCGGA 

TGTCGAGGAC 

CTGAAAACCT 

AAAGATCGTC 

CTGGGAGCTG 

GATCGAGAGT 

GTTCCCCGAC 

TGGGACGAGC 

CTTCTTCGGT 

GGAGACGTCC 

CCGGACCGGA 

GCCGGCGGAG 

TATTTCCTAC 

GTCGCTGGTG 

GGTCGTCGGT 

GCGCGGACTG 

CTTCGCCGAG 

TCATCGGGTG 



GATGTGGTGC 

TACGGTGTGG 

GTGGCTGGGG 

ATCGCCCGGA 

CGCACCATGC 

TCGACCGTGG 

GAGGGTGTCC 

CAGTTACGCG 

CCGTTTTTCT 

TGGTTCACGA 

CAGGGGATGG 

GAAACACTCG 

GGCGGCCTGG 

GACTGGACCC 

TTCCAGCGAC 

TGGGATGCCG 

CTGAGCACGG 

CGTCGGGTGG 

CCGGCCGCGC 

GCGGATGATG 

AGTGTCGTGG 

GGCCGTGGTG 

CATTCTGTTG 

CTTGGCCGGG 

GGTCCTTCGG 

CGTGTTCTGG 

GTTGATGAGG 

CAGGTGGCGG 

GGTGGTGGGG 

TTGGGTGCGC 

AGCCGTCGTG 

GGCGGGGCTC 

TTGTTGTCGG 

TCGACGCCTT 

GCGGGTGCGG 

TTCTCCTCCA 

GCGTTTCTTG 

GCGTGGGGGA 

TCCCGTCGGG 

GTGGGTCGTG 

GGTTTCGCTT 

GTGGAGGGCC 

TCGGGGTGGT 

GAGTTGGTCC 

GCTGAGCGGG 

GGGCTGGCCG 

GCCACCGCCA 

CTCATGCGGT 

GAAGCTCGCT 

GGCAGTTCGA 

GCCACCATCG 

GTCCGACTGC 

GAGTATCTCA 

GAGGACGAGC 

CCGGAGCAGC 

GACCGCGGCT 

TACGTCCGTC 

ATCTCGCCGC 

TGGCAGCTTT 

GTATTCGTCG 

GCCGAGGGCT 

ACCTTCGGCC 

GCGCTGCACC 

GGTGTGGCCG 

GCCAGGGACG 

GGAGTTTCGC 

TTGGCGGTGG 
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19141 TGCGGGGGAG TGCGGTCAAT CAGGATGGTG 
19201 CGTCGCAGCA GCGGGTGATT CGTGCGGCGT 
19261 TGGATGTGGT GGAGGCGCAT GGTACGGGGA 
19321 CGTTGCTGGC GACGTATGGG CAGGGGCGTA 
19381 TGAAGTCGAA CATCGGTCAT ACGCAGGCGG 
19441 TGCTGGCTCT GGAGCGGGGT GTGGTGCCGA 
19501 ATGTGGACTG GTCGACCGGT GCGGTGGAGT 
19561 AGGCTGAGCG TCTTCGTCGG GCAGGCATTT 
19621 ATGTGATCGT GGAGGAGGCA CCTGCGGAAC 
19681 TGGTTGCTGC CGGTGATCTG GTGGTGCCGT 
19741 TGCGTGCACA GGCGGCACGC TTGGCTGCGC 
19801 ATGTGGGCTG GTCGTTGGTG GCCACGAGGT 
19861 GCACTGATCT TGATTCCATG GCGGGGTCGT 
19921 CGGGGGTGGT GTCGGGTGTG GCTCCGGCTG 
19981 GTCAGGGTTC GCAGTGGGTG GGGATGGCGG 
20041 CGGAGGCGGT GGCGGAGTGT GCCGCGGTGC 
20101 AGGTGTTGCG CGGTGGTGAG GCTGTTCTTG 
20161 GGGCGGTGAT GGTGTCACTG GCTCGGACCT 
20221 TTGTGGGGCA TTCGCAGGGT GAGATTGCTG 
20281 CCGATGGTGC GCGGGTGGTG GTGTTGCGGA 
20341 GCGGCATGGT CTCCGTCAGT CTCCCGGCCG 
20401 GCGGCCGGTT GTCGGTGGCT GCGGTCAACG 
20461 CCCAGGCCCT GGATGAGTTG TTGGCCGGCT 
20521 TCCCGGTGGA CTATGCCTCC CACTCCGCGC 
20581 AAGCGCTGGC GGACATCACT CCGCAACACT 
20641 CGGACTGGCT GGACACGACC GCTCTGGATG 
20701 CGGTCCGGTT CCAGGAAGCC GTCGAAGGGC 
207 61 AGTGCAGCCC ACACCCCGTC CTCGTCCCCG 
20821 CCGATGCTGT GGCGCTGGGT TCGCTACGGC 
20881 CGTCCCTCGC GGAAGCCTTC GTCCAGGGCG 
20941 GTGCGAGCCC CCGCACCGTC GACCTGCCCA 
21001 TGGAGGGATC CCCGGCGTTG TCTTCGAACG 
21061 GGGATGCGGT CGAGCGCGAG GACTCGGCGG 
21121 AGGCTCTGCA CATGACATTG CCGGCCTTGT 
21181 GGAAGGTGCA GCGCTGGCGT TACCGGGTGG 
21241 AGGAGTCGCT GCAGGGCGGC TGGTTGCTCG 
21301 GCGTCACTCA GTCGGTGGCG GAGGTGGCGG 
21361 TCGACGCCCT GCATCCCGAC CGCGCAGCAT 
21421 TGCGGGGTGT GGTGTCGTTC CTGGCGTGGG 
21481 TGTCTGCGGG TCTGGCGGCA TCGCTGGCGT 
21541 CCGGTGAGTC GGCGCCGCGT CTGTGGCTGG 
21601 CCGACACCGG TGCGGTGATC GACCCCGTAC 
21661 TTGCTCTGGA ACACCCCGAA TTGTGGGGCG 
21721 AGCCTGGTTC GATTACCGAC CACGCGCATG 
21781 TGGTGCAGGC TGCTGCCCGA GGCGAGGACC 
21841 TACCCAGGCT GGTGCGTTCA GGCGGCAGTG 
21901 GCGACACCGT ACTGGTCACC GGCGGGATGG 
21961 TGGCTGACAA CGGTGCCGAC CAGGTAGTAC 
22021 CCGAGGCGCT GAGGGCCGAG TTCGACGGGC 
22081 ACACCGAGGA CAGCGACGCG CTGCGGTCCT 
22141 TGCGCGCGGT CATCCATGCG CCGACCGTGG 
22201 TGGTGCGATT CGCCCGCACC ATCAGCAGCA 
22261 TGCTGAGCGG CATCGACACG GCGCACGACG 
22321 GGGGAAGCGC GGGGCAGAGC GCCTACGCGG 
22381 AGCACCGCCG TCTGCGCGGA CTGCCCGGTA 
22441 ATCGATCCCT TGCCTCCCTC GGTGACTCGT 
22501 CCATACCCGG CGCGCTCGCC TCCCTCCAGG 
22561 TGGTGGCGGA TGTCGACTGG GAGCGGTTCT 
22621 CCTTCTTCGA CGACGTGCAC GACGCCCACC 
22681 ACGGACAGGC CCGGGACGAG GACGGCGGTA 
22741 CCGAGACGGA GCAACAGCGA GAGCTCGTGT 
22801 TAGGCCACTC CTCCACCGAC GCGGTCCAGC 
22861 ACTCACTGAC AGCGGTCCAG CTCCGGAACC 
22921 CGACAACGCT GGTCTTCGAC TACCCGACCA 
22981 AACTGTTCGG TGTGTCCGGC GCACCAGCTG 
23041 AGGACGACCC CGTCGTCATC GTGGGGATGG 
23101 CGGAAGCCTT CTGGAAGCTG CTCGAAGCGG 



CGTCCAATGG TCTGGCGGCG CCGAATGGTC 
TGGCTGACGC GGGTCTGGCT CCTGCCGATG 
CGCGGTTGGG TGATCCGATC GAGGCTCAGG 
CCAGTGGGCG TCCGGTGTGG CTGGGGTCGG 
CGGCCGGTGT GGCTGGTGTG ATGAAGATGG 
AGACGTTGCA CGTGGATGAG CCGTCTCCGC 
TGCTGACTGA AGAGCGGCCG TGGGAGCCGG 
CCGCCTTCGG TGTCAGTGGC ACGAATGCGC 
CGGAACCGGA GCCGGAGCCG GGAACTCGTG 
GGGTGGTGTC CGGGCGGGAT GCGGGGGCGT 
ATGTGTCGAG CACGGGTGCG GGTGTGGTTG 
CGGTGTTCGA GCACCGGGCG GTGATGGTCG 
TGGCCGGGTT TGCTGCGGGT GGGGTTGTGC 
AGGGTCGTCG TGTGGTGTTC GTCTTTCCTG 
CTGGGTTGCT GGATGCGTGT CCGGTGTTCG 
TGGACCCGTT GACCGGTTGG TCGCTGGTCG 
GGCGGGTTGA TGTGGTGCAG CCGGCGTTGT 
GGCGGTATTA CGGTGTGGAG CCTGCTGCGG 
CGGCTTGTGT GGCTGGGGGG TTGAGTCTGG 
GCCGGGCGAT CGCCCGGATC GCCGGTGGGG 
GCCGTGTCCG CACCATGCTC GACACCTACG 
GCCCGTCCTC GACCGTGGTG TCCGGTGACG 
GTGAGCGGGA GGGGQTCCGG GCTCGTCGTG 
AGATGGACCA GTTACGCGAT GAGCTGCTGG 
CCAGCGTTCC GTTCTTCTCG ACGGTGACGG 
CGGGGTACTG GTTCACGAAT CTGCGGGAGA 
TTGTGGCTCA GGGGATGGGC GCGTTCGTCG 
GTATCGAGCA GACCCTCGAC ACCGTGGAAG 
GTGATGAGGG CGGCCTGGGA CGGTTCCTCA 
T.CCCGGTCGA CTGGTCCCGC ACCTTCGAGG 
CCTATCCCTT CCAACGGCAA CGTTTCTGGT 
GCGTCGAGGG TGAGGCGGAC GTCGCGTTCT 
TTGTAGCCGA GGAGTTGGGG ATCGACGCCA 
CGTCGTGGCG GCGGCGTGAG CGGCAGCGTC 
AGTGGAAGCG TCTCCCGAAT TCGCGGGCAC 
TCGTCCCGCA GGGCCGTGCC GGCGATGTCC 
CCAAGGGTGG TGAAGCCACG GTCCTGGAGG 
ACGCCGAGGC CCTCACCCGG TGGCCGGGTG 
AGGAGCAGGC CCTTGCCGAA CACCCCGTTC 
TGGCCCAGGC GTTGATCGAT GTCGGCGGGT 
TCACGGAAGC TGCCGTCGTG ATCGGTGCTG 
ACGCGCAGCT GTGGGGCTTC GGCCGTGTCC 
GGCTGATCGA CCTGCCCGCT GTGGCAGGCG 
CCGACCTACT GGCCACGGTC CTGGCCACGA 
AGGTCGCGGT CCGGACGACC GGTACTTACG 
CACACTCGGG TGCGCGGAGG TGGCAGCCGC 
GACCGCTGAC CGCCCACATC GTCCGTTGGC 
TCCTGGGAGG TCAGGGAGCA GACGGCGAGG 
ACACGACGAA GATCGAACTC GCGGACGTGG 
TGCTCGACCG CACGACCGGC GAACACCCGC 
TCGAGTTCGC CTCGGTGGCC GAGTCGGACC 
AGATCGCCGG CGTCGAGCAG CTCGACGAGG 
TGGTCTTCTT CTCCTCCGTC GCGGGCGTCT 
CGGGCAACGC CTTCCTCGAC GCCGTCGCCC 
CGTCGGTGGC CTGGACTCCG TGGGACGACG 
ACCTCGACCG ACGAGGACTG CGAGCACTGT 
AAGTGCTCGA CCAGGACGAG GTCCACGCCG 
ACGCCGGCTT CAGTGCCGTC CGGCGCACTT 
GGCCGGCCCT GTCCACGGCT GCGACCAACG 
CGGAACTCGT ACGACGTCTG CGTCCGCTGA 
CGCTCGTCCA GAGTGAAGTC GCTGCCGTCC 
CACAGCGCGC GTTCCGAGAG ATCGGGTTCG 
GGCTTACGGC CACCACGGGC ATGCGCCTTC 
CCAACGGACT CGCCGAGTAC CTGCGCTCCG 
ACCTCTCCGT CGTCCGGAAC GCGGATGAGG 
CCTGCCGGTT CCCGGGCGGG ATCGATACGC 
GCGGCGATGT CATCTCCGAA CTTCCGGCCA 
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23161 ACCGCGGCTG GGACATGGAG CG ACT CCTG A ACCCGGACCC CGAGGCGAAG GGCACCAGCG 
23221 CCACACGCTA CGGCGGTTTC CTCTACGACG CCGGGGAGTT CGACGCCGCC TTCTTCGGTA 
23281 TCTCGCCGCG TGAGGCGTTG GCGATGGACC CGCAGCAACG GCTGCTGCTG GAAACCGTCT 
2 3341 GGGAGCTCAT CGAGAGCGCC GGCGTGGCGC CCGACTCGCT CCACCGGAGC CGGACCGGCA 
234 01 CGTTCATCGG CAGCAACGGC CAGTTCTACG CACCGCTGCT GTGGAACTCC GGCGGTGATC 
2 3461 TGGAGGGCTA CCAAGGCGTG GGCAACGCCG GCAGCGTCAT GTCCGGCCGC GTCGCCTACT 
2 3521 CCCTCGGTCT TGAGGGGCCT GCGGTGACGG TGGATACGGC GTGTTCGTCG TCGCTGGTGG 
23^81 CACTGCACCT GGCGGTGCAG GCGCTGCGCC GTGGCGAGTG CTCACTCGCC ATAGCCGGCG 
23641 GTGTGACGGT GATGTCCACA CCGGACAGCT TCGTTGAGTT CTCACGGCAA CAGGGCCTTT 
23701 CCGAGGACGG CCGTTGCAAG GCGTTCGCGA GCACAGCCGA TGGTTTCGGC CTCGCCGAGG 
237 61 GCGTTTCGGC GCTGTTGGTG GAGCGGTTGT CGGATGCGCG GCGGTTGGGT CATCGGGTGT 
2 3821 TGGCGGTGGT GCGGGGGAGT GCGGTCAATC AGGATGGTGC GTCGAATGGG TTGACGGCGC 
23881 CGAATGGTCC GTCGCAGCAG CGGGTGATTC GTGCGGCGTT GGCTGACGCG GGTCTGGCTC 
23941 CTGCTGATGT GGATGTGGTG GAGGCGCATG GTACGGGGAC GCGGTTGGGT GATCCGATCG 
24001 AGGCTCAGGC GTTGTTGGCG ACGTATGGGC AGGGTCGTGC GGGTGGGCGT CCGGTGGTGT 
24061 TGGGGTCGGT GAAGTCGAAC ATCGGGCATA CGCAGGCGGC GGCTGGCGTG GCTGGTGTGA 
24121 TGAAGATGGT GCTGGCGCTG GAGCGGGGTG TGGTGCCGAA GACGTTGCAT GTGGATGAGC 
24181 CGTCACCGCA TGTGGACTGG TCGGCTGGTG AGGTGGAGTT GGCGGTTGAG GCGGTGCCGT 
24241 GGTCGCGGGG TGGGCGGGTG CGGCGGGCTG GTGTGTCGTC GTTCGGGATC AGTGGCACGA 
24301 ATGCGCATGT GATTGTGGAG GAGGCGCCTG CGGAGCCGGA GCCGGAGCCG GGAACTCGTG 
24 3 61 TGGTTGCTGC TGGTGATCTG GTGGTGCCGT GGGTGGTGTC CGGGCGGGAT GCGGGGGCGT 
24 421 TGCGTGAGCA GGCGGCCCGG TTGGCTGCGC ACGTGTCGAG CACGGGTGCG GGTGTGGTTG 
24 481 ATGTGGGGTG GTCGTTGGTG GCCACGAGGT CGGTGTTCGA GCACCGGGCG GTGATGGTCG 
24541 GC AGTGAACT CGATTCCATG GCGGAGTCGT TGGCTGGCTT CGCTGCGGGT GGGGTTGTGC 
24 601 CGGGGGTGGT GTCGGGTGTG GCTCCGGCTG AGGGTCGTCG TGTGGTGTTC GTCTTTCCTG 
24 661 GTCAGGGTTC GCAGTGGGTG GGGATGGCGG CTGGGTTGCT GGATGCGTGT CCGGTGTTCG 
24721 CGGAGGCGGT GGCGGAGTGT GCCGCGGTGC TGGATCCGGT GACGGGTTGG TCGCTGGTCG 
24 781 AGGTGTTGCG CGGTGGTGGT GAGGCTGTTC TTGGGCGGGT TGATGTGGTG CAGCCGGCGT 
24 841 TGTGGGCGGT GATGGTGTCA CTGGCCCGGA CCTGGCGGTA TTACGGTGTG GAGCCTGCTG 
24 901 CGGTTGTGGG GCATTCGCAG GGTGAGATCG CTGCGGCTTG TGTGGCTGGG GGGTTGAGTC 
24 961 TGGCCGATGG TGCGCGGGTG GTGGTGTTGC GGAGCCGGGC GATCGCCCGG ATCGCTGGTG 
25021 GGGGCGGCAT GGTCTCGGTC GGTCTTTCAG CTGAGCGTGT CCGCACCATG CTCGACACCT 
25081 ACGGTGGCCG GGTTTCGGTC GCGGCGGTCA ATGGCCCGTC CTCGACCGTC GTGTCCGGTG 
25141 ACGTCCAGGC CCTGGATGAG TTGTTGGCCG GTTGTGAGCG GGAGGGTGTC CGGGCTCGTC 
25201 GTGTCCCGGT GGACTATGCC TCCCACTCCG CGCAGATGGA CCAGTTACGC GATGAGCTGC 
25261 TGGAAGCGCT GGCGGACATC ACTCCGCAAC ATTCCAGTGT TCCGTTCTTC TCGACGGTGA 
25321 CGGCGGACTG GCTGGACACG ACCGCTCTGG ATGCGGGGTA CTGGTTCACG AATCTGCGGG 
25381 AGACGGTCCG GTTCCAGGAA GCCGTCGAAG GGCTCGTGGC TCAGGGGATG GGCGCGTTCG 
254 41 TCGAGTGCAG CCCGCACCCC GTCCTCGTCC CCGGTATCGA GCAGACCCTC GACGCCCTCG 
25501 ACCAGAACGC CGCCGTACTC GGCTCCCTGC GGCGTGACGA AGGCGGCCTG GACCGACTCC 
25561 TCACATCCCT CGCGGAAGCC TTCGTCCAAG GCGTTCCCGT CGACTGGACC CACGCCTTCG 
25621 AAGGCATGAC CCCCCGCACC GTCGACCTGC CCACCTACCC CTTCCAACGA CAGCACTACT 
25681 GGCCCAAGCC CGCACCGGCC CCCGGCGCGA ACCTGGGCGA CGTGGCGTCC GTGGGCCTCA 
2 5741 CCGCGGCCGG CCACCCCCTT CTGGGCGCGG TCGTGGAGAT GCCCGACTCC GACGGGTTGG 
25801 TGCTCACCGG GCAGATCTCC CTGCGGACCC ATCCCTGGCT CGCCGACCAC GAGGTGCTCG 
25861 GATCGGTGCT CCTGCCGGGC ACCGCGTTCG TCGAGCTTGC CGTCCAGGCC GCCGACCGCG 
25921 CCGGTTACGA CGTACTGGAC GAGCTGACGC TGGAGGCGCC CCTCGTGCTC CCCGACAGGG 
25981 GCGGCATCCA GGTGCGTCTG GCCCTCGGGC CGTCCGAGGC AGACGGACGC CGGTCCCTCC 
2 6041 AGCTGCACAG CAGGCCGGAG GAGGCTGCCG GGTTCCACCG CTGGACGAGG CACGCGAGTG 
26101 GATTCGTCGT TCCCGGCGGT ACCGGGGCGG CGCGGCCCAC CGAGCCGGCC GGCGTGTGGC 
26161 CGCCCGCAGG TGCCGAGCCG GTCGCTCTCG CATCGGACCG GTACGCCCGG CTCGTCGAGC 
26221 GCGGCTACAC CTACGGCCCC TCCTTCCAGG GGCTGCACAC CGCATGGCGC CACGGGGACG 
26281 ACGTGTACGC GGAAGTGGCG CTGCCAGAAG GAACACCGGC CGACGGCTAC GCCCTGCATC 
26341 CGGCCCTGCT GGACGCGGCG GTCCAGGCCG TCGGACTCGG CTCGTTCGTC GAGGATCCCG 
26401 GCCAGGTGTA CCTGCCGTTC CTCTGGAGCG ACGTGACGCT GCACGCGACC GGGGCCACGT 
2 64 61 CCCTGCGGGT GAGGGTTTCA CCGGCCGGTC CCGACACCGT TGCGCTGGCC CTCGCCGACC 
26521 CGGCCGGGGC GCCGGTGGCC ACGGTGGGCG CCCTCCGTCT GCGTACGACG TCCGCGGCGC 
26581 AGCTCGCCCG TGCGCGCGGG AGCGCGGAAC ACGCGATGTT CCGCGTGGAG TGGGTGGAGG 
26641 AGGGCTCGGC CGCGGACCGG TGCCGGGGCG GCGCGGGCGG GACGACGTAC GAGGGGGAAC 
26701 GCGCCGCCGA GGCCGGGGCC GCCGCTGGTA CCTGGGCCGT ACTCGGCCCC CGGGTGCCGG 
2 6761 CCGCCGTCCG GACGATGGGC GTGGATGTCG TCACCGCCCT CGACACGCCG GACCACCCCG 
26821 CGGACCCGCA GAGCCTCGCG GACCTGGCGG CGCTCGGGGA CACCGTTCCC GACGTGGTCG 
2 6881 TCGTGACCAG CCTCCTGAGC CTCGCCTCCG GAGCGGATTC CCCCCTAGGG AACCGGCCCC 
26941 GGCCGACCGC CGCCGAGCAG GACACCGCCG CCACGGTCGC CGGCGTCCAC AGCGCACTCC 
27001 ACGCGGCCCT GGACCTGGTG CAGGCATGGC TGGCCGACGA ACGCCACACC GCCTCCCGGC 
270 61 TGGTGCTCGT CACCCGGCAC GCGATGACCG TCGCCGAGTC CGACCCCGAG CCTGACCTGC 
27121 TCCTCGCCCC GGTGTGGGGA CTCGTGCGGT CCGCCCAGGC CGAGAACCCC GGCCGCTTCG 
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TGCTCGCCGA 

CGGCCGCATC 

CCACGGACGA 

CGGAAGCGGG 

TGGGCCCCGA 

TCCTGGCCCT 

TCACCGAGGT 

TGACCGGTGG 

GTGGCTGGTC 

CCCTGCACGA 

GCGGTGTCGG 

CCGCGAGCAA 

GCTCGTCCAG 

TCGATGTCGT 

GCGAGGGCGG 

TCGGGGCGGA 

CCGAGCGGAT 

GGTTGCCGCC 

GCCAGGCACG 

GAACCGTGCT 

CCGAGCACGA 

CGGCCGAACT 

GCAAGGCGCT 

TTCACACGGC 

ACACCGTCCT 

TGGACCTCGA 

CCGGCCAGGG 

CGCGCCGCGG 

TCACCGGCGG 

TGCCGACCGA 

TCCTGCCGAT 

CGCCGCTGCT 

CGACCGCCGC 

ACCCACGTCG 

GTGGCCCCGA 

CCGCAGTCGA 

TCGTGTTCGA 

CCGCGACACA 

TGTCTGCGGC 

TCATCGCCAC 

CGGGCAACGC 

CAACCGGGGA 

AGCGGTTGGA 

GGAAAACGAC 

GGCGCGCCAC 

CCCGAGAGCC 

CCGTCAACAC 

GTCTACCCGA 

CCGGTACGTC 

CCTTCTTCGG 

TGGAGACGTC 

GCGACACCGG 

TGGTCGAAGG 

GTGTTGCTTA 

CGTCGTTGGT 

CGTTGGTGGG 

AGCGGGGTTT 

GTGCTGCCGA 

GTCATCGGGT 

GGTTGACGGC 

CGGGTCTGGC 

GTGATCCGAT 

GTCCGGTGTG 

TGGCTGGTGT 

ATGTGGATGA 

AAGAGCGGCC 

GTGTCAGTGG 



CATCGACGGC 

GGAGGTGGCG 

GGGACTGGTC 

CACCCTGGCG 

CGAGGTACGG 

GGGCATGTAC 

CGGCGGGGGC 

ATTCGGGCCG 

CTTCGCCGAG 

CCTGGCAGGC 

CATGGCGGCC 

GGGCAAGTGG 

GACGACCGAG 

CCTGAATGCC 

CCGGTTCGTC 

CGGCGTCCCG 

CGGGCAGATG 

GTTGCGCGCC 

TCATGTGGGC 

GATCACCGGG 

CGTCCGCCGG 

CGGTGCGCTG 

CAAGGCCCTC 

CGGCGTGCTC 

CAAACCCAAG 

CCCGGCCCTG 

CAGTTACGCC 

GCTCACCTCC 

CCTGGCCGAC 

CGAGGCCCTG 

GCGCCTGAAC 

GAGTGGTCTG 

CCCCGCCACC 

TGCCCTGCGC 

GGCCATCGAC 

ACTCAGAAAC 

CTACCCCAAC 

ACCCACCGCA 

CGCGTCACCC 

ACGGCTGGCC 

GGACAACCGC 

GCACACGGCG 

GACGTGATGG 

GCACCGGGAG 

GACCGAACTC 

GATCGCCATC 

GCCCGAACAG 

GGACCGCGGG 

GTACGTGCGT 

GATCfCGCCG 

CTGGGAGGCA 

CGTGTACATC 

CCTGGAAGGC 

CACGTTCGGT 

GGCGTTGCAT 

TGGGGTGACG 

GTCGGTGGAT 

GGGTGTGGGT 

GTTGGCGGTG 

GCCGAATGGT 

TCCTGCTGAT 

CGAGGCTCAG 

GCTGGGGTCG 

GATGAAGATG 

GCCGTCACCG 

GTGGGAGCCG 

CACGAACGCG 



GACGAGGCAT 
ATACGGGCCG 
GTGGCCGACG 
AACCTCGCCC 
ATCGCCGTCC 
CCGGACGAGG 
GTCACGACGC 
GTGGCCGTGA 
GCCGCGTCGG 
CTGCGCGGCG 
GTGCAGTTGG 
GACGTTCTCG 
TTCGAGCAGC 
CTCTCGGGTG 
GAGATGGGCA 
GACATCCGGT 
CTCGACGAGA 
TGGCCGGTGC 
AAGGTCGTCC 
GCGGGCACGC 
CTGCTGCTGG 
GGCGCCGAGG 
CTGGAGGACA 
GACGACGGTG 
GTGGACGCGG 
TTCGTGATCT 
GCGGCCAATC 
GTGTCACTCG 
ATCGACCGTG 
CACCTGTTCG 
GAGGCCGCGC 
GTCCGGGTGC 
GGCCCCGAGG 
GACCTCGTCC 
GCCGAACAGG 
CGGCTGAACG 
CCGAGCGCGC 
GCCCCGCTGC 
GGCGGACCGG 
ACCCTTGCCT 
AGCGGCCCCG 
GCGTGGACGT 
CCGCCGGCCG 
GTTTTGGTGG 
AAGGAGGTCA 
GTGGGAATGA 
TTCTGGGACC 
TGGGACTTGG 
GAGGGCGGTT 
CGTGAGGCGT 
TTCGAGAGCG 
GGCGCGTGGA 
CAGCTCGCCA 
CTTGAGGGGC 
CTGGCGGTGC 
GTGATGTCGT 
GGGCGGTGCA 
GTGTTGTTGG 
GTGCGGGGGA 
CCGTCGCAGC 
GTGGATGTGG 
GCGTTGTTGG 
GTGAAGTCGA 
GTGCTGGCGC 
CACGTGGACT 
GAGGCTGAGC 
CATGTGATCG 



CCTGGGATGC 

GCGCCGTGTA 

AGGCTGCGGG 

TGGTGCCGTG 

GTGCCGCCGG 

GGCTCATGGG 

TCGCGCCAGG 

CGCACCACCG 

TGCCGGTGGC 

GCGAGTCGGT 

CACGGCACTG 

CGGCGCAGGG 

GCTTCCGCGC 

ACTTCGTCGA 

AGACCGACAT 

ACGTCGCCTT 

TCATGGCGCT 

GGCGCGCCCA 

TCACCGTCCC 

TGGGAGCCCT 

TCAGCCGCAG 

TCACGGTGGC 

TACCGCCCGA 

TGGTGTCCGG 

CCCTGACCCT 

TCTCATCGGC 

AGTTCCTGGA 

GCTGGGGGCT 

ACCGGATGAG 

ACAGGGCAAC 

TGGAGGACCG 

GGCACAGGCC 

CGTTCGCCCG 

GCGGCCACGT 

CCTTCCGGGA 

CCGAGACCGG 

TCGCCGATCA 

TCGCCGAACT 

CATCCGCGGT 

CGCAGTGGAC 

GCGAGTCCGG 

CGGACGACGA 

AGTCAGCGAG 

CTGAGGCGGA 

GCGATCGACT 

GCTGCCGGTT 

TGCTGAACAG 

GGCGCCTGTA 

TCCTGTACGA 

TGGCGATGGA 

CCGGTATCAA 

GCACCGGCTA 

TCGGCACCAC 

CTGCGGTGAC 

AGGGGTTGCG 

CGCCGGTGAC 

AGGCGTTCCC 

TGGAGCGGTT 

GTGCGGTCAA 

AGCGGGTGAT 

TGGAGGCGCA 

CGACGTATGG 

ACATCGGGCA 

TGGGGCGGGG 

GGTCGGCCGG 

GTCTTCGTCG 

TGGAGGAGGC 



TCTGCCCCGA 

CGTACCGCGG 

GCCCTGGCGG 

CCCGGACGCC 

GGTCAACTTC 

CGCGGAGGCG 

TGACCGGGTG 

GATGCTCGTA 

GTTCCTGACC 

GCTGGTGCAC 

GGATGCCGAG 

CCTCGACGAG 

GACCAGTGGT 

CGCCTCGGCG 

CCGTACCGAC 

CGACCTCGCC 

CTTCGACGCC 

CGAGGCACTG 

GGCCGCGCTC 

GGTCGCCCGC 

CGGCGTCGCC 

GGCCTGCGAC 

GCATCCGGTC 

GCTCACCCCT 

GGAGTCAGTG 

AGCGAGCATG 

CACCCTCGCC 

GTGGCACGAG 

CCGGGCGGGG 

GGAACTCGGC 

GGCCGCGGAC 

GTCGGCGCGG 

GGAGCTGGCG 

CGCCCTGGTG 

CATCGGTTTC 

CCTCCGCTTG 

CCTGCTCGAA 

GGAACGGGTG 

GGACGAGGAG 

ACACCTCCCG 

GCAGGCCCAG 

TCTCTTCGCC 

TCCTTTCGTC 

GAAGCTGCGC 

CCGCGAGACC 

CCCCGGCGGC 

CGGCGGTGAC 

CGATCCCGAT 

CTCGGGGGAG 

CCCGCAGCAG 

GCGCGCCGCT 

TGCCGGCAGC 

ACTAGGGGCC 

GGTGGATACG 

GCGGGGTGAG 

GTTGACGACG 

GGCTTCGGCG 

GTCGGATGCG 

TCAGGATGGT 

CCGTGCGGCG 

TGGTACGGGG 

GCAGGGGCGT 

TACGCAGGCG 

TGTGGTGCCG 

TGCGGTGGAG 

GGCAGGCATC 

GCCTGCGGAA 



GCCGTCGCCT 

CTGGCCCGCG 

CTGGACGTCA 

TCCCGCCCGC 

CGGGACGTCC 

GCGGGCGTCG 

ATGGGCCTGG 

CGGATGCCGC 

GCGTACTACG 

TCCGCTGCGG 

GTGTTCGGCA 

GAGCACATCG 

GGGCGCGGGA 

CGTCTCCTGC 

CTCGGCGTCG 

GAGGCGGGTG 

GGTGTCCTGC 

AGGTTCGTCA 

GACGCCGAGG 

CACCTCGTCA 

CCCGACCTGG 

GTCGCCAACC 

ACGGGCATCG 

GAACGGGTGG 

ATCGGCGAAC 

CTGGGCGGGC 

CGACACCGGG 

GCCAGCGGTC 

ATCGCGCCCA 

GATCCGGTAC 

GGAACACTGC 

GCAGGTACCG 

GCGGCACCGG 

CTCGGACACA 

GACTCCCTGA 

CCCGGCACGC 

CTCCTCGCTC 

GAACAACTCC 

ACGCGCACGC 

GTCGGTTCGC 

GAATCCGGAG 

TTCCTCGACA 

CTTCTGCTGG 

GAATACCTGT 

GAGGAACGGG 

GGCGACGCCA 

GGCATCGCGG 

CCGGACCGGG 

TTCGACGCCG 

CGGTTGCTGC 

CTGAGAGGCA 

CCCTACCGCC 

GCTTCGGGGC 

GCGTGTTCGT 

TGTTCGCTGG 

TTCAGTCGGC 

GATGGTTTTG 

CGGCGGTTGG 

GCGTCGAATG . 

TTGGCTGACG 

ACGCGGTTGG 

GCGGGTGGGC 

GCGGCCGGTG 

AAGACGTTGC 

TTGCTGACTG 

TCCGCCTTCG 

CCGGAGCCGG 
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AGCCGGGAAC 
GGGATGCGAG 
GTGCGGTCGA 
TTGCGATCGG 
GGGTGGTGCC 
TCTTTCCTGG 
CGGTGTTCGC 
CGCTGGTCGA 
AGCCGGCGTT 
AGCCTGCTGC 
GGTTGAGTCT 
TCGCTGGTGG 
TGGAGGAGTT 
TGTCCGGTGA 
GGGCTCGTCG 
ATGAGCTGCT 
CGACGGTGAC 
ATCTGCGGGA 
GCGCGTTCGT 
ACGCCCTCGA 
ACCGGTTTCT 
GCGCCTTCGA 
AGCACTACTG 
GCTTCTGGTC 
ATGTAGAGGC 
TTCGTGCCGA 
CGCTGCCCGA 
TCGCTGAGTC 
AGGTGGACAC 
CCGGCCCGGA 
ACCTCGCCGC 
ATCTTGGCCG 
CTGGTCCTTC 
GGCGTGTTCT 
GGGTTGATGA 
ATCAGGTGGC 
ATGGTGGTGG 
GTTTGGGTGC 
TGAGCCGTCG 
TGGGCGGGGC 
CGTTGTTGTC 
AGTCGACGCC 
TGGCGGGTGC 
TGTTCTCCTC 
ATGCGTTTCT 
TGGCGTGGGG 
TGTCCCGTCG 
CGGTGGGTCG 
CCGGTTTCGC 
TTGTGGAGGG 
CGTCGGGGTG 
TGGAGTTGGT 
CGGCTGAGCG 
ACGGGCTGGC 
ACGCCACCGC 
CGGCCGTGCC 
ACCAGCTGAA 
TCAACAGCAC 
AGACGTGCTG 
ACATGCCTGA 
AACGCCTGCG 
CGGCGATGAG 
TGGCCGAAGG 
CCTTGTATCA 
GGTACGACGC 
CGATGGACCC 
ATATCGATCC 



TCGTGTGGTT 
GGCGTTGCGT 
TGTGGGCTGG 
CAGTGAACTC 
GGGGGTGGTG 
TCAGGGTTCG 
GGAGGCGGTG 
GGTGTTGCAG 
GTGGGCGGTG 
GGTTGTGGGG 
GGCCGATGGT 
GGGCGGCATG 
CGACGGCCGG 
CGTCCAGGCC 
TGTCCCGGTG 
GGAGGCGCTG 
GGCGGACTGG 
GACGGTCCGG 
CGAGTGCAGC 
CCAGAATGCC 
CACGTCCCTC 
AGGCGTGACC 
GTTGATGGCG 
GGTAGTGGCC 
AGTCGAGGCT 
AGTCAACCAG 
AAAGCCGGGC 
GTTGGCGAGG 
TGCTCATCCT 
GAACGTCGAT 
CGCACCTTCC 
GGTTGGTGAG 
GGATGCCGGT 
GGGTCTGGAG 
GGAGGTGTGC 
GGTGCGTGGT 
GGGTGGTTGG 
GCATACGGCC 
TGGTGGCAGT 
TCGGGTGTCG 
GGATCTGGGT 
TTTGGCGGAG 
GGTGAATCTG 
CAATGCCGGT 
TGATGCGTTG 
GATGTGGGCT 
GGGGGTGCGG 
TGGTGAGGCG 
TTCTGCCCGT 
CCAGGTCCAG 
GTTGAAGCGG 
CCGTGCTCAG 
GGCGTTCAAG 
CGCGGCCACC 
CATCGCACGC 
GTCTTCACCG 
AGGCGCTGGG 
GGTGCAGAAC 
CGCCTGGCGC 
CCCCACCGCC 
CCAACAGAAT 
CTGCCGTTTC 
CCGCGACGCG 
CCCGGACCCG 
AGCCCAGTTC 
GCAGCAGCGG 
GTACACAGTC 



GCTGCCGGTG 
GCACAGGCGG 
TCATTGGTGG 
GACTCCATGG 
TCGGGTGTGG 
CAGTGGGTGG 
GCGGAGTGCG 
GGCAGGGACG 
ATGGTGTCAC 
CATTCGCAGG 
GCGCGGGTGG 
GTCTCCGTCA 
TTGTCGGTGG 
CTGGATGAGT 
GACTATGCTT 
GCGGACATCA 
CTGGGCACGA 
TTCCAGGAAG 
CCGCACCCCG 
GCCGTATTCG 
GCGGAAGCCT 
CCTCGCACCG 
GAAGAGGCAC 
GATGCGGATG 
GTAATGCCGG 
TGGCGCTACG 
AACTGGCTCG 
ACGGCAGCCG 
GACCGGTCGC 
CACCTCGTGT 
TGTCTTGCCG 
GGGCCGCGGT 
GCGGTGATTG 
CATCCCGAGT 
CGGCGGTTCG 
TCGGGTGTGT 
CGGCCGCGTG 
CGGTGGTTGG 
GCGCCTGGTG 
GTGCGGGCCT 
GAGCCGGTGA 
ATCTCTGTCC 
GGTGAGTTGG 
GTGTGGGGCA 
GCGGTGCGTC 
GGTGAGGGGA 
GCGATGGATC 
TTCGTCGCGG 
CCCCGTCCGT 
GGCGGGGGCC 
TTGTCGGGGT 
GCTGCCGTTG 
GAGTTGGGTT 
GGGATCCGGC 
TTCCTGCAGT 
GAAGACGAGG 
CTTCTTGACC 
CCTGAGCCGA 
TCGGCGAAAT 
AAATATGTGG 
CACTCGCTTC 
GGCGGGGGCA 
GTGGCGGGGC 
GAGAACCCCG 
GATGCGGGGT 
TTGCTGCTGG 
AGGGGAACGG 



ATCTGGTGGT 
CACGCTTGGC 
CCACGAGGTC 
CGGGTTCGTT 
CTCCGGCTGA 
GGATGGCGGC 
CTGCGGTGCT 
CGACTGTTCT 
TGGCTCGGAC 
GTGAGATTGC 
TGGTGTTGCG 
GCCTGCCGGC 
CTGCGGTCAA 
TGTTGGCCGG 
CCCACTCCGC 
CTCCGCAGGA 
CTGCCCTGGG 
CCGTCGAAGG 
TCCTCGTCCC 
GCTCGCTGCG 
TCGTCCAGGG 
TCGACCTGCC 
CGGTCTCTCA 
CCGAGGCTGC 
CGTTGTCTTC 
ACGTTGCGTG 
TCGTGACTCC 
CAGAACTGGG 
AATACGCGCA 
CCTTGCTGGC 
CGTCGCTGGT 
TGTGGCTGGT 
ATCCGGTACA 
TGTGGGGTGG 
TGGGTGTTGT 
GGGTGCGTCG 
GGACGGTGTT 
TGGGTGGTGG 
CTGGGGATCT 
GTGATGTGGC 
CGGCGGTGTT 
AGGAGGCGGC 
TGGATCCCTG 
GTGGGGGGCA 
GTCGGGGTGT 
TGGCGTCGGT 
CCGAGCGTGC 
TCGCTGATGT 
TGATCAGTGA 
AGGGGTTGGG 
TGTCTCGTGT 
TTCTCGGGCA 
TTGATTCCCT 
TGCCGGCCAC 
CTCAGCTCCT 
TCCGCCAGGC 
CACTGCTCGC 
CCACCGAATC 
CGACGGCTGA 
AAGCGCTCCG 
TCGCCGCCTC 
TCGACTCGCC 
TTCCCGAGGA 
GCACCACGTA 
TCTTCGGGAT 
AGACATCCTG 
CGACGGGGAT 



GCCGTGGGTG 
TGCGCACGTG 
GGTGTTCGAG 
GGCCGGCTTC 
GGGTCGTCGT 
TGGGTTGCTG 
GGATCCGGTG 
TGGGCGGGTT 
CTGGCGGTAT 
TGCGGCTTGT 
GAGCCGGGCG 
CGGCCGTGTC 
TGGCCCGTCC 
TTGTGAGCGG 
GCAGATGGAC 
CTCCAGTGTT 
TGCGGGGTAC 
GCTTGTGGCT 
CGGTATCGAG 
GCGTGACGAA 
CGTTCCCGTC 
CACCTACCCC 
GCCCCCTCAC 
TGCTGAACTT 
GTGGCACCGG 
GAAGCGTCTG 
AGCAGGAACC 
CGTATCCGTC 
TGCGCTGCGT 
CCTGGACCAG 
GTTGGCGCAG 
GACGCGGGGT 
GGCGCAGGTG 
GCTGATCGAC 
GGCGTCGGCT 
TCTGGTGCGT 
GGTCACGGGT 
GGCGGATCAT 
GGTGCGGGAG 
TGATCGTGTG 
CCATGCGGCT 
TGATGTGATG 
TGGTCTGGAG 
GGCGGTGTAT 
TGGTCTGCCG 
GGGTGGTGCG 
TGTGGCGGTG 
GGACTGGGAA 
CCTGCCGGAG 
CTTGGTCGGT 
GCGGCAGGAG 
TGGTTCCGCG 
CACTGCTGTC 
CATGGCATTC 
TCCTGACGCC 
ATTGGCGTCC 
TCTGACACGC 
GATCGACGAG 
GCCACTGACC 
TGCGTCGCTC 
CCGTGAAGCG 
CGAAGATCTC 
CCGCGGGTGG 
CGTCCGGGAA 
TTCGCCGCGT 
GGAGCTTTTC 
ATTCATCGGA 



GTGTCCGGGC 
TCGGGTGTAA 
CACCGGGCTG 
GCTGCGGGTG 
GTGGTGTTCG 
GATGCGTGTC 
ACGGGTTGGT 
GATGTGGTGC 
TACGGTGTGG 
GTGGCTGGGG 
ATCGCCCGGA 
CGCACCATGC 
TCGACCGTGG 
GAGGGTGTCC 
CAGTTACGCG 
CCGTTTTTCT 
TGGTTCACGA 
CAGGGGATGG 
CAGACCCTCG 
GGCGGCCTGG 
GACTGGTCCC 
TTCCAACGAC 
TCGGAGAACA 
CTGGGTGTCG 
CAGAGCCAAC 
ACCACCGGGG 
GACACCACGT 
AGCTTTGCGC 
CAAGCCCTGA 
GCCACTGACG 
GCGTTGGTTG 
GCGGTGGTTG 
TGGGGTTTCG 
CTGCCGGTGG 
GGTTTTGAGG 
GCTGTGGTGG 
GGTCTTGGTG 
GTGGTTCTTG 
CTGGAGGGGT 
GCGTTGCGGG 
GGTGTTCCTC 
GCGGCCAAGG 
GCGTTTGTGT 
GCGGCGGCGA 
GCCACGAGTG 
GCGCGGGAGT 
ATGGCTGATG 
CGTTTCGTCA 
GTGCGTGCTG 
GAGGAGGAGT 
GAGGAGTTGG 
CAGGACGTCC 
GAGCTACGCA 
GATCATCCCA 
GAGAGCGAGT 
CTTTCCCTGG 
CTCCGGGAGA 
ATGGATGGCG 
ACTGGAGCTG 
AAGGAGAACG 
ATCGCCATCA 
TGGCGCTTCC 
GATCTGGATG 
GGCGCGTTCC 
GAGGCGTTGG 
GAGCGTGCCG 
GCCGGACATC 
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35221 AGGGCTATGG TCCCGACCCC AAGAGGGCTC 
35281 GAACGGCATC GGCCGTGCTG TCCGGGCGTA 
35341 CGGTCACGGT GGACACGGCG TGTTCGTCAT 
35401 CGCTGCGCCG GGGCGAGTGC TCACTCGCCA 
35461 CGGATGCCTT CGTGGAGTTC AGCCGCCAAC 
35521 CATTCGCCGC GGCAGCGGAC GGTATGGGAT 
35581 AGCGGTTGTC GGATGCGCGG CGGTTGGGTC 
35641 CGGTCAATCA GGATGGTGCG TCGAATGGCC 
35701 GGGTGATTCG TGCGGCGTTG GCTGACGCGG 
35761 AGGCGCATGG TACGGGGACG CGGTTGGGTG 
35821 CGTATGGGCA GGGGCGTGCG GGTGGGCGTC 
35881 TCGGGCATAC GCAGGCGGCG GCTGGTGTGG 
35941 GGCGGGGTGT GGTGCCGAAG ACGTTGCATG 
36001 CGGCCGGTGC GGTGGAGTTG CTGACTGAAG 
36061 TTCGTCGGGC AGGCATCTCC GCCTTCGGTG 
36121 AGGAGGCGCC TGCGGAACCG GAGCCGGAGC 
36181 TGGTGGTGCC GTGGGTGGTG TCCGGGCGGG 
36241 GCTTGGCTGC GCACGTGTCG AGCACGGGTG 
36301 TGGCCACGAG GTCGGTGTTC GAGCACCGGG 
36361 TGGCGGGGTC GTTGGCCGGG TTTGCTGCGG 
36421 TGGCGCCGGC TGAGGGTCGT CGTGTGGTGT 
36481 TGGGGATGGC GGCTGGGTTG CTGGATGCGT 
36541 GTGCCGCGGT GCTGGATCCG GTGACGGGTT 
36601 ACGCGACTGT TCTTGGGCGG GTTGATGTGG 
36661 CACTGGCTCG GACCTGGCGG TATTACGGTG 
36721 AGGGTGAGAT TGCTGCGGCT TGTGTGGCTG 
36781 TGGTGGTGTT GCGGAGCCGG GCGATCGCCC 
36841 TCAGTCTCCC GGCCGGCCGT GTCCGCACCA 
36901 TCGCGGCGGT CAACGGTCCG TCCTCGACCG 
36961 AGTTGTTGGC CGGTTGTGAG CGGGAGGGTG 
37021 CCTCCCACTC CGCGCAGATG GACCAGTTAC 
37081 TCACTCCGCA GGACTCCAGT GTTCCGTTCT 
37141 CGACCGCTCT GGATGCGGGG TACTGGTTCA 
37201 AAGCCGTCGA AGGGCTTGTG GCTCAGGGGA 
37261 CCGTCCTCGT CCCCGGTATC GAGCAGACCC 
37321 TCGGCTCGCT GCGGCGTGAC GAAGGCGGCC 
37381 CCTTCGTCCA AGGCGTTCCC GTCGATTGGA 
374 41 CCGTCGACCT GCCCACCTAC CCCTTCCAAC 
37501 CATCGTCTGC GAATGGCGTT GACGGTGAGG 
37561 GTGAGGACTC GGTCGCTGTA GCCGAGGAGT 
37 621 TGTTGCCGGC CTTGTCGTCG TGGCGGCGGC 
37681 GGCGTTACCG GGTGGAGTGG AAGCCTTT.CC 
37741 GCTGGTTGTT CGTGGTGCCG CGGGGCTTGG 
37801 CTGCCGTCAC GGCGCGGGGT GGCGAGGTCA 
37861 ACCGCCGGGC TTATGCGGAG GCTGTCGCGG 
37 921 . TGTCCTGGGA TGATCGGCGG CACTCGGAGC 
37 981 CGCTGGTGTT GGCGCAGGCG TTGGTTGATC 
38041 GGCTGGTGAC GCGGGATGCG GTGGTCGCTG 
38101 CGGTACAGGC GCAGGTGTGG GGTTTCGGGC 
38161 GGGGTGGGCT GATCGACCTG CCGGTGGAGG 
38221 CGTATGCCGA CCTGCTCGCC ACGGTTGTGG 
38281 TGCGTGGTTC GGGTGTGTGG GTGCGTCGTC 
38341 GTGGTTGGCG GCCGCGTGGG ACGGTGTTGG 
38401 ATACGGCCCG GTGGTTGGTG GGTGGTGGGG 
38461 GTGGCAGTGC GCCTGGTGCT GGGGATCTGG 
38521 GGGTGTCGGT GCGGGCCTGT GATGTGGCTG 
38581 ATCTGGGTGA GCCGGTGACG GCGGTGTTCC 
38641 TGGCGGAGAT CTCTGTCCAG GAGGCGGCTG 
38701 TGAATCTGGG TGAGTTGGTG GATCCCTGTG 
38761 ATGCCGGTGT GTGGGGCAGT GGGGGGCAGG 
38821 ATGCGTTGGC GGTGCGTCGT CGGGGTGTTG 
38881 TGTGGGCTGG TGAGGGGATG GCGTCGGTGG 
38941 GGGTGCGGGC GATGGATCCC GAGCGTGCTG 
39001 GTGAGGCGTT CGTCGCGGTC GCCGATGTGG 
39061 CTGCCCGTCC CCGTCCGTTG ATCAGCGACC 
39121 AGGAGCAGGA GCAACTCCAC GCCCCCGTCC 
39181 GGCTGTCCAT GCTGTCTCCC GCCGGACGGG 



CGGAGAGCGT GGCGGGTTAC CTGCTGACGG 
TTTCCTACAC GTTCGGTCTT GAGGGGCCTG 
CGCTGGTGGC ACTGCACCTG GCGGTGCAGG 
TAGCCGGCGG TGTGGCCGTC ATGTCGACCC 
AGGGCATGGC AAGAGACGGC CGATGTAAGG 
GGGGCGAGGG AGTTTCGCTG CTCTTGCTGG 
ATCGGGTGTT GGCGGTGGTG CGGGGGAGTG 
TGGCGGCGCC GAATGGTCCG TCGCAGCAGC 
GTCTGGCTCC TGCCGATGTG GATGTGGTGG 
ATCCGATCGA GGCTCAGGCG TTGCTGGCGA 
CGGTGTGGCT GGGGTCGGTG AAGTCGAACA 
CTGGTGTGAT GAAGATGGTG CTGGCGTTGG 
TGGATGAGCC GTCACCGCAC GTGGACTGGT 
AGCGGCCGTG GGAGCCGGAG GCTGAGCGTC 
TCAGTGGCAC GAACGCGCAT GTGATCGTGG 
CGGGAACTCG TGTGGTTGCT GCCGGTGATC 
ATGTGGGGGC GTTGCGTGAG CAGGCGGCAC 
CGGGTGTGGT TGATGTGGGC TGGTCGTTGG 
CGGTGATGGT CGGCACTGAT CTTGATTCCA 
GTGGTGTCGT CCCCGGGGTG GTGTCGGGTG 
TCGTCTTTCC TGGTCAGGGT TCGCAGTGGG 
GCCCGGTGTT CGCGGAGGCG GTGGCGGAGT 
GGTCGCTGGT CGAGGTGTTG CAGGGCAGGG 
TGCAGCCGGC GTTGTGGGCG GTGATGGTGT 
TGGAGCCTGC TGCGGTTGTG GGGCATTCGC 
GGGGGTTGAG TCTGGCCGAT GGTGCGCGGG 
GGATCGCTGG TGGGGGCGGC ATGGTCTCCG 
TGCTCGACAC CTACGGCGGC CGGGTTTCGG 
TGGTGTCCGG TGACGTCCAG GCCCTTGATG 
TCCGGGCTCG TCGTGTCCCG GTGGACTATG 
GCGATGAGCT GCTGGAGGCG CTGGCGGACA 
TCTCGACGGT GACGGCGGAC TGGCTGGACA 
CGAATCTGCG GGAGACGGTC CGGTTCCAGG 
TGGGCGCGTT CGTCGAGTGC AGCCCGCACC 
TCGACGCCCT CGACCAGAAT GCCGCCGTAC 
TGGACCGACT TCTCACATCC CTCGCGGAAG 
CCCACGCCTT CGAGGGCGTG ACCCCTCGCA 
GGCAACGTTT CTGGTTGGAC GGTTCGCCGG 
CGGACGCCAT GATCTGGGAC GCGGTCGAGC 
TGGGGATCGA CGCCGAGGCT TTGCACACGG 
GTCGGGTGGA GCATCGACGG CTTCAGGACT 
CGGCCGCGCT TGATGAGGTG CTCGGTGGTG 
CGGATGATGG TGTGGTTGCG CGGGTGGTGG 
GTGTCGTGGA GCTCGATCCG ACCCGTCCTG 
GCCGTGGTGT GAGCGGGGTC GTGTCGTTCT 
ATCCTGTTGT TCCCGCCGGT CTTGCCGCGT 
TTGGCCGGGT TGGTGAGGGG CCGCGGTTGT 
GTCCTTCGGA TGCCGGTGCG GTGATTGATC 
GTGTTCTGGG TCTGGAGCAT CCCGAGTTGT 
CGCCCGAACC TGGCTCGACG TGCGACCACA 
CGTCGGCTGG TTTTGAGGAT CAGGTGGCGG 
TGGTGCGTGC TGTGGTGGAT GGTGGTGGGG 
TCACGGGTGG TCTTGGTGGT TTGGGTGCGC 
CGGATCATGT GGTGCTTGTG AGCCGTCGTG 
TGCGGGAGCT GGAGGGGTTG GGCGGGGCTC 
ATCGTGTGGC GTTGCGGGCG TTGTTGTCGG 
ATGCGGCTGG TGTTCCTCAG TCGACGCCTT 
ATGTGATGGC GGCCAAGGTG GCGGGTGCGG 
GTCTGGAGGC GTTTGTGTTG TTCTCCTCCA 
CGGTGTATGC GGCGGCGAAT GCGTTTCTTG 
GTCTGCCGGC GACGAGTGTG GCGTGGGGGA 
GTGGTGCGGC GCGGGAGTTG TCCCGTCGGG 
TGGCGGTGAT GGCTGATGCG GTGGGGCGTG 
ACTGGGAACG TTTCGTCACC GGTTTCGCCT 
TCCCGGAGGT CCGTACCGCC CTGCGGAACC 
CCGAGGACCG ATCGGCACAG CTTCTGCGGC 
AAGCCGAACT GGTGAAGCTC GTCCGTACCG 
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39241 AGGCAGCCGC TGTTCTGGGG CACGGCTCCG 
39301 AGGAGCTGGG CTTCGACTCC CTCACCGCTG 
39361 CCGGCACCAG GCTCCCCGCC AGCGCCGTCT 
39421 GGTGGCTGCT CGCGGGGATG CGGCATGCCG 
39481 GACCCGGGCC GGACGCCGAC GAAGGTCGGT 
39541 CCGATCTGTA CCGGCGTTCC GCCGAGTTGG 
39601 CCGACACCGC GGCCTTCCGC CCGGTGTTCC 
39661 AGGCCGTTCC GCTGGCGGAC GGGGTGCGCA 
39721 CGCCGGTCGG CGGGCCGCAC GAGTTCGCGC 
39781 CGGTCTCGGC GCTTCCGCTG CCCGGCTACC 
39841 ACGCCGTGCT CGCCGCGCAG GCCGAGGCGG 
39901 TCCTGGTCGG CTACTCGGCG GGCGGACTGA 
39961 GGCGCGGCAC ACCGCCGAGC GGTGAGGTGC 
40021 AACCGGTGTT CGGCTGGCAG AAGGAGCTCA 
40081 CCATGGACGA TACGCGGCTG ACGGCCCTCG 
40141 GGCCGGCGCC CTCCGGACTG CCCACCCTCC 
40201 GGACCGGGGC CATCGACTGG CGGGCCTCCT 
40261 CGGGGAACCA CTTCACGATC ATGCGCGAGC 
40321 TCTGGCTGAA GGGGCTCACC CCCTGACACC 
40381 GGCGTCCCGG TCCTCCCGAC CCGCGTGCGC 
40441 GGCATGCCCC GCTTTCCTCC CCCTCTCCGA 
4 0501 CCGGTGAAGG AGCGTGTTGC ACTCATGCAG 
40561 AGTGTCGAAC ACGCGGCGGA CGCAGCTCGA 
4 0621 GGAGATGGAG GACAGCGAAC TGGGGCGCCG 
40681 CTTCGGCGCC AACGGCGATC CGTACGCCCG 
40741 ACCTTTCTAC GACGCGATAC GGACCCTGGG 
40801 GGTCACCGCC GACCCCGGGC TCGGGGGCCG 
4 0861 GGAAGGCTCG TGGCCGGTGC GGGCGAAGAC 
40921 GCACCAGGCG TTCCTGCGGC TGGAGCGCGA 
40981 GCCGGTGCTG GGGGCCGCGG CGGTCGACGC 
41041 GGGGCTCGCG AAGGGGCTGC CGGACACGTT 
41101 GCCGGTCGAG GTGCTGGCGC GGATCTGGGG 
41161 GCGTGACTGC CGGGCGCTCG CTCCCGCGCT 
41221 GCTGAGCAAG GACATGGCGT CCGCCCTGGA 
41281 CGCGACGCCG CGCCTCGCCG GCCCCGCCGA 
41341 CGTTCTGCTC TGCACGGAGC CGGTGACCAC 
41401 TCCCGGGCAG TGGCCCGTGC CCTGCACCGG 
41461 GGCGCTGCAC CGGGCGGTGT CGTACCGTAT 
41521 GTTGGCGGGC TGCGAGGTCA AGTCCGGTGA 
41581 CCGGAACGGA CCGTCCGCAG CCGCCCCGCC 
41641 CCCGTCGGTC TTCGGTGCCG CCGCCTTCGA 
41701 TGTGACGGGA GCGGCCCTCC AGGCCCTCGC 
41761 ACCCGTCGTA CGACGGCGGC GTTCCCCTGT 
41821 CGCCGCATGA GCATCGCGTC GAACGGCGCG 
41881 ATGATGACCA CCTTCGCGGC CAACACGCAC 
41941 CTGCGGACAG CCGGGCACGA GGTGCGCGTG 
42001 ACGCAGGCGG GGCTCACCTC GGTCCCGGTG 
42061 GCGACCTGGG GCGACGATGC CTACATCGGC 
42121 CCCGGCCTGT GGACGTGGCC GTACCTCCTG 
4 2181 TACGAGTTGC TGAACAACGA GTCCTTCGTG 
42241 CGGCCCGACC TGGTGATCTG GGAGCCGCTG 
42301 ACCGGCGCGG CCCACGCCCG GCTGCCGTGG 
42361 GCGTTCCTCG CCGAGCGTGC CCTGCAACCG 
42421 TGGCTGGGCC GCATGCTCGA CCGGTACGGC 
42481 CAGTGGACCA TCGACACGCT GCCGCGCAGC 
42541 ACCCTGGACA TGCGGTACGT GCCGTACAAC 
42 601 GAACCGTGCG AGCGGCCCCG GGTCTGTCTG 
42661 CGGGACCATG TCCCCCTCGA CCACCTGCTC 
42721 GTGGCCACGC TCGACACCAC CCAGCAGGAG 
42781 CGGCTGGTGG ACTTCGTCCC GCTGCACGCG 
42841 CACGGTGGTC CGGGCACGTG GTCGACGGCG 
42901 GACACCTCGT GGGACACACC GGTGCGGGCG 
42961 TCGATGCCGG TGGGGGAACT GGGCGTCGAG 
43021 GGGGAGCCGG AGTTCCGCGC GGGCGCCGAG 
43081 GCCCCCGGTG ACGTCGTACC GGACCTGGAA 
4 3141 ATGGCGGGAA GGCGGTGAGA CGATGCGCGT 
43201 CTTCCACGGG CTGGTGCCGC TGGCGTGGGC 



CGCAGGACGT CCCGGCCGAG CGGGCGTTCA 
TTCAGCTACG CAACAGACTG GCCGCCGCCA 
TCGACCACCC CCACGCTGCG GCTCTCGCCA 
ACGGTGGACA CGGTGGTGGG CACGCCGGTG 
CGGCCGGCGC TGGTCACAGC GGAATGCTGG 
GCCGGAGCCG GGAGTTCATC GGGCTGCTGG 
ACGGGCCGGC GGACCTCGAC GCGCCGTTGG 
AACCGCAGTT GATCTGTTGC AGCGGGACCG 
GCCTGGCTTC GTTCTTCCGC GGCACTCGTG 
TGCCCGGTGA GCAGTTGCCC GCGGACCTCG 
TCGAGAAGCA GACCGGGGGT GCGCCGTTCG 
TGGCCCACGC ACTGGCCTGC CACCTGGCCG 
TGGTGGACGT CTATCCGCCG GGCCGGCAGG 
CCGAGGGCAT GTTCGCCCAG GACTTCGTGC 
GCACGTACGA CCGTCTCATG GGCGAGTGGC 
TGATCCGGGC CACCGAACCC ATGGCGGAGT 
GGGAGTACGA CCACACCGCC GTCGACATGC 
ACGCGGAGGA CGCGGCCCGG CACATCGACG 
TGCCCGCACC CTGTGACTCC TGCCCGTACC 
AACGGACGAG TCGCTCAGGA GGTCCCCATC 
ACGCATCGAC GACCCGATCC CCCTCAGGGA 
GACATGCAAG GCGTACAGCC CGAACCAGCC 
ACAGAGCGAA CGGCGCACGG AAGCCGCCCA 
CCTGCAGATG CTCCGCGGCA TGCAGTGGGT 
GCTGCTGTGT GGCATGGAGG ATGACCCGTC 
CGAGCTGCAC CGGAGCAGGA CCGGAGCCTG 
CATCCTCGCC GACCGGAAGG CTCGGTGCCC 
CGACGGGCTG GAGCAGTACG TGCTGCCCGG 
GGAGGCCGAG CGACTGCGGG AGGTCGCGGC 
GTGGCGCCCG CTGATCGACG AGGTCTGCGC 
CGACCTGGTC GAGGAGTACG CGGGGCTGGT 
CGTCCCGGAG GAGGACCGCG CCCGGTTCGG 
GGACAGCCTC CTGTGTCCCC AGCAGTTGGC 
GGACCTGCGT CTCCTCTTCG ACGGCCTCGA 
CGGTGACGGA ACGGCCGTGG CCATGCTCAC 
GGCGATCGGG AACACCGTGC TCGGGCTCCT 
CCGGGTGGCT GCCGGGCAGG TTGCCGGGCA 
CGCGACGCGG TTCGCCCGGG AGGACCTGGA 
CGAGGTGGTG GTCCTGGCCG GAGCGATCGG 
TGCCCCACCG GGCCCAGCGG CCCCGCCCGC 
GAACGCGCTG GCCGAACCCC TCGTCCGGGC 
GGAGGGGCCC CCCCGGCTGA CGGCGGCGGG 
CGTCGGCGGG CTGCACCGGG CTCCGGTGGC 
CGCTCGGCCC CCCGCCGGCC CCTGCGCGTG 
TTCCAGCCGC TGGTTCCCCT GGCCTGGGCA 
GTGAGCCAGC CCTCGCTGAG CGACGTGGTG 
GGCACCGAGG CTCCGGTCGA GCAGTTCGCG 
GTCAACAGCA TCGACTTCAC CGGCAACGAC 
GGCATGGAGA CCATGCTGGT GCCGGCCTTC 
GACGGCGTAG TCGAGTTCGC CCGTGACTGG 
ACGTTCGCCG GCGCGGTGGC GGCGCGCGTC 
GGGCAGGAGA TCACCCTGCG CGGGCGGCAG 
TTCGAGCACC GGGAGGATCC CACGGCCGAG 
TGCTCGTTCG ACGAGGAGAT GGTCACCGGG 
ATGCGGCTGG AGCTGTCCGA GGAGCTGCGC 
GGACCGGCGG TCGTACCCCC CTGGGTGTGG 
ACGATCGGCA CCTCCCAGCG TGACTCCGGC 
GACTCCCTCG, CCGACGTGGA CGCGGAGATC 
CGCCTGCGGG GCGCGGCCCC CGGCAACGTC 
CTGATGCCGA CCTGCTCGGC GATCGTGCAC 
GCGCTCCACG GCGTCCCGCA GATCATCCTG 
CAGCGCATGC AGCAACTCGG GGCGGGCCTG 
GCGCTGCGGG ACCGGGTCCT GCGGCTGCTG 
CGGATCCGGG CCGAGATGCT CGCGATGCCC 
CGACTCACCG CGGAGCATGC CACCGGCGCG 
ACTGCTGACC TGCTTCGCCA ACGACACCCA 
GCTGCGGGCC GCCGGGCACG AAGTCCGCGT 
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4 3261 GGCCAGTCAG CCCGCCCTGT CCGACACGAT 
43321 GGGCCGGGAC ACCGCCTTCC TGGAGCTGAT 
43381 CTCCACCGGC ATCGACCTGG GCGTCCGCGC 
43441 CATGCACACG ACCCTGGTGC CCACGTTCTA 
4 3501 CGGGCTCGTC GCGCTGACCC GGGCCTGGCG 
43561 CTTCGCCGGG GCGTTGGCGG CGCGGGCCAC 
43621 GTCGGACCTC ATCGTCCGGT TCCGCCGGGA 
4 3681 CG AGCACCGC GAGGACCCCA TGGCGGAGTG 
43741 CACCTTCGAC GAGGAGCTGG TGACCGGGCA 
4 3801 GCGGCTGCCC ACCGGGACGA CGACGGTGCC 
43861 CGTGGTCCCC GCATGGGTCC GGCAGCGTGC 

43 921 TGTGTCGGCC CGGCAGACCC TGGGCGACGG 
4 3981 GGGCGACGTG GACGCGGAGA TCGTGGCCAC 

44 041 GCCGGTGCCG GACAACGTCC GGCTGGTGGA 
4 4101 CTGTTCGGCG ATCGTGCACC ACGGCGGCGC 
44161 CGTCCCGCAG ATCGTCCTCG GTGACCTCTG 
44 221 GGCCGCGGGC GCGGGCCTGT TCATCCATCC 
44281 GGGCGTGCGC CGGGTGCTGA CGGACCCTTC 
4 4 341 CGAGATGAAT GCAGAGCCGA CGCCGGGCGA 
44 401 GAGCGGCGGA CGCGGACGAG GAGGCGGGAA 
44 461 GGTACGAGGA CGAGTTCGCC GAGATCTACG 
4 4 521 ACGCCGGCGA GGCGAAGGAC GTGGCGGACC 
44 581 CCCTCCTGGA CGTGGCCTGC GGCACGGGCG 
4 4 641 ACGACGCCCG CGGTCTCGAA CTGTCCGCGA 
44 701 CGGGCGTGCC GCTGCACCAA GGGGACATGC 
44 761 CGGTCACCTG CATGTTCAGC TCCGTCGGCC 
44821 CGCTGCGGTG CTTCGCCCGG CACACCCGGC 
44881 GGTTCCCGGA GACCTTCACC GACGGCTACG 
44 941 GGACCATCTC CCGGGTGTCC CACTCGGTAC 
4 5001 ACTACGTGAT CGCCGACGCC GAGCACGGTC 
45061 CGCTGTTCCC GCGGCATGCG TACACGGCCG 
45121 ACCTCGACGG CGGGCCCTCG GGCCGGGGGC 
45181 CGCGCACCGC CCGATCACCC TGCTCAACGC 
45241 GACCTTTCAC ATGTCGTACG ACGACCACGC 
4 5301 AGGTGACGAG CGCTTCCTGC TGAACACCGT 
45361 GGCGCTCGTG GACGAGTTGC TGTTCCGCTG 
4 5421 CATCGGCCTG GACGTCCTGC ACGGCGCCGA 
4 5481 CGGCAAGCCG GTCACGTCGG CGGAACCGGC 
45541 TTCACGCTCA GCGACCCTCC TGCGGGAGCT 
45601 GGGCTTCGGC GTCTCCTTCC TGCCCGACCT 
45661 CCTGGCCGCC CGCGCCACCA ACGTGGTGCT 
45721 GGACCGGCTG GCCCTGCGCT ACGAGTCCGA 
4 5781 CCACTACGAC CGGCACCTGC GGGCCGTGCG 
45841 CATCGGCGGC TACGACGACC TGCTGCCGAG 
4 5901 CTTCCCGCGC GGCCTGGTCT TCGGCGTGGA 
4 5961 CGTGTCAAGA CGCTCCGCGG CCCGGCAGGA 
4 6021 GGAGCACGGG CCGTTCGACG TCATCATCGA 
4 6081 GACGTCGTTC TCGGTGATGT TCCCCCACCT 
4 6141 CACCTTCACC TCCTACTGGC CCGGGTACGG 
4 6201 AACAACCGCG CTGGAGATGG TCAAGGGACT 
4 6261 GGACGGCGCG GCCACGGCCG ACTACATCGC 
4 6321 AACGACCTCG TCTTCCTCGA GAAGGGCGAT 
4 6381 GCCCCGGGAG CCGTTCTGGA ACGACAACTA 
4 6441 ACCACTGTCC GCGCCACCTC GGAACCACCT 
4 6501 CGCACACCGG ACCGACACCG GCCGACGCGG 
4 6561 CCCTGGACCT CGACCCGCAC TACGCCGAAC 
4 6621 GCCTGCCCTA CGGCGAGGGC ACGGCCTGGC 
4 6681 TTCTGGGCGA CTCCCGCTTC AGCACCGCGG 
4 6741 TCCCCACCCC GCCCGAGCCG GACGGCGTCC 
4 6801 TGCGGCGGCT GGTGGGCAAG GCCTTCACGG 
46861 TCCGCTCCCT CGTCGACTCC CTGCTCGACG 
4 6921 TGGTCGAGTT CCTCGCCGTT CCCTTCCCCG 
4 6981 CCTTGGAGGA CCGCGACCTG TTCCGGACCT 
47 041 TCACCGCCGC GGAGATACAG CGGGTCCAGC 
47101 TCGCCCAGCG CCGCGACGCC CCCACCGAGG 
47161 ACAACGACGA CCACCTGACC AAGGGCGAGA 
4 7221 CGGGCCACGA GACGTCGGTC AACCAGATCA 



CACCCAAGCG GGACTGACCG CGGTGCCCGT 
GGGGGAGATC GGCGCGGACG TCCAGAAGTA 
GGAGCTGACG AGCTGGGAGT ACCTGCTCGG 
CTCGCTGGTC AACGACGAGC CGTTCGTCGA 
GCCCGACCTC ATCCTGTGGG AGCACTTCAG 
CGGCACGCCC CACGCCCGCG TGCTGTGGGG 
CTTCCTCGCG GAGCGGGCGA ACCGGCCCGC 
GCTGGGCTGG GCGGCCGAAC GGCTGGGCTC 
GTGGACGATC GACCCGCTGC CGCGGAGCAT 
GATGCGGTAC GTGCCGTACA ACGGGCGGGC 
GCGGCGGCCC CGGATCTGCC TGACGCTCGG 
CGTGTCGCTG GCGGAGGTGC TGGCCGCGCT 
GCTGGACGCC TCCCAGCGCA AGCTCCTGGG 
CTTCGTGCCC CTGCACGCCC TGATGCCGAC 
CGGTACCTGG CTGACGGCCG CCGTCCACGG 
GGACAACCTG CTGCGCGCCC GGCAGACACA 
GTCCGAGGTC ACCGCGGCCG GGCTCGGTGA 
CATCCGGGCC GCCGCACAGC GCGTCCGGGA 
GGTCGTCACG GTGCTGGAGC GGCTCGCCGC 
CCATGCGGGC TGACACGGAG CCGACCACCG 
ACGCCGTGTA CCGGGGCCGG GGCAAGGACT 
TCGTGCGCGA CCGGGTGCCG GACGCGTCCT 
CGCACCTGCG GCACTTCGCC ACGCTCTTCG 
GCATGCTGGA CATCGCCCGC TCCCGGATGC 
GATCCTTCGA CCTGGGGCCA CGCGTCTCCG 
ACCTGGCCAC CACCGCCGAA CTCGACGCGA 
CCGGCGGCGT GGCCGTCATC GAACCGTGGT 
TGGCGGGTGA CATCGTACGC GTCGACGGCC 
GGGACGGCGG CGCCACCCGC ATGGAGATCC 
CCCGGCACCT GGTCGAGCAC CACCGCATCA 
CGTACGAGAA GGCGGGCTAC ACCGTCGAGT 
TGTTCGTCGG CACCCGGACG TGAACCCGCC 
CGTTCACACG GATCACCGGA CCACGCGAAG 
GGTGCTGGAA GCGATACTGC GGTGCGCCGG 
CGAGGAATGG GGAGCCGCCG AGATCACCGC 
CGAGATCCCG CAGGTGGGCG GTGAGGCGTT 
CCGGATCAGC CATGTGCTGC AGGTGACGGA 
CGGCCAGGAA CTGGGCGGCC GTACCTGGAG 
GTTCGGCCCG CCGTCCGGCC GCACCGCGGG 
GCGCGGCCCG CGGACCATGG AGGGCGCGGC 
GCACGCGACG ACCAACGAGA CGCCCCCACT 
CAAGTGGGGC GGCGTCCACT GGTTCACCGG 
CGACCAGGCG GTGCGGATCC TGGAGATCGG 
CGGCGCCTCA CTGAAGATGT GGAAGCGCTA 
CATCTTCGAC AGTCGGCGTG CGACCAGCCG 
CGACCCGGAG TTCATGCGCC GCGTCGCCGA 
CGACGGCAGC CACATCAACG CACACATGCG 
GCGCAACGGC GGCTTCTACG TCATCGAGGA 
AGGGCCATCC GGAGCCCGGT GCCCGTCCGG 
GATCGACTCG GTGCACTACG AGGAGCGGCC 
CAGGAACCTC GTCGGGCTGC ACGCCTACCA 
CAACAAGGAG GGCGGCATCC CCCACACCGT 
GCCACGGCCG CAACCAGAGC CGGAAACCGC 
CCAGCAAAGG ACACACCGCT GTGACCGATA 
TACCCGCCTA CCCGTTCAGC CTGCCGCACG 
TCCGCCGGGA CGAACCCGTC TCCAGGGTGC 
TGGTCACCCG CATGTCCGAC GCCCGTATCG 
CCGCCACCGA TCCCGCCACC CCCCGGATGT 
TCGCCCAGGA CCCGCCGGAC CACACCCGGC 
CACGCCGGGT GGAGGAGATG CGGCCCCGTG 
ACATGGTGGC GCACGGTTCA CCCGCCGACC 
TCGCGGTCAT CTGCGAACTG CTCGGCGTGC 
TCTCCGACGC CATGCTCTCC TCGACCCGGC 
AGGACTTCAT GGTCTACATG GACGGCCTGG 
ACCTGCTCGG CGCCCTCGCC CTCGCCACCG 
TCGTCAACAT GGGGGTGAGC CTGCTCATCG 
CCAACCTCGT CCACCTCCTG CTGACCGAGC 
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47281 
47341 
47401 
47461 
47521 
47581 
47641 
47701 
47761 
47821 
47881 
47941 
48001 
48061 
48121 
48181 
48241 
48301 
48361 
48421 
48481 
48541 
48601 
48661 
48721 
48781 
48841 
48901 
48961 
49021 
4 9081 
49141 
49201 
49261 
49321 
49381 
49441 
49501 
49561 
49621 
49681 
49741 
49801 
49861 
49921 
49981 
50041 
50101 
50161 
50221 
50281 
50341 
50401 
50461 
50521 
50581 
50641 
50701 
50761 
50821 
50881 



GCAAGCGCTA 
TGCTGCGGTA 
TGGAGCTGAG 
CCAACCGGGA 
ACCCGCACAT 
TGGAACTCCA 
AGCCGGTCGC 
TCGTCTCCTG 
CCGGCCGGGA 
CGGCCACCCT 
GAGACCACGA 
GCGGGCGACA 
AGACGGGCCA 
AGCGCCGAGG 
CCGGGCAGCC 
TCCTGCGGCA 
TCACCGGCCA 
ATGCGCTCAC 
TTGTACAGCT 
CGGAAACGCA 
TCCTTGGCGT 
TTCGACCGCT 
AGTTCCCGCT 
CGCGCGGCCG 
CGCCGCTCGC 
TGCGAGACGA 
GCCCCGACGT 
AGCAGGCAGG 
TCGCCCGTGA 
GCGTAGCCGT 
TCCAGCTCCT 
AGGGAGCGCA 
AGGTCCGGTG 
CCGTCCGGGC 
TCACCGATGA 
GGCCGTCCGC 
GGTTGCGTGG 
CGGTGTGCGT 
GGGAGCGGCC 
ACGGTGCGCA 
GCGGGAGAAG 
CGTACGCGGT 
ACCACGTCGG 
TGAATGTGTG 
GTCCCGGGCA 
AGCCGAGTCC 
GCGCAAGAAG 
CGGAACGTCC 
ACCGCTGCGC 
CGGGGAACCG 
AAGCTCCGCC 
GCGGTGTCAG 
ACACCGTGGG 
CGATGCGAGC 
CGACGCCCTC 
CTCCCGCGGC 
TGGACCTCTG 
GCCGCCGGCC 
GTCCACGACC 
GCCGTCCTCG 
GCCCTCTTCG 



CGAGTCGCTG 
CACACCGCTG 
CACCGTGACC 
CGAGGAGGTC 
AGCGTTCGGG 
GGAGGCCCTG 
GGGACTGAAG 
GTGACGGCCG 
CCGCAGACCC 
CCATGTGCAT 
CCAGTGCGCC 
GGTGGTTCGT 
GGGCCAGCCG 
GCCGGAAGAG 
CCGCCGCGAA 
GCCAGCCGAT 
GCACGCGCAG 
CGGGGTCGAC 
CGGCGAGTGC 
TGGGTTGAGG 
TGCGGACCCG 
CGGTGTTGCG 
GGCGTGCCAG 
CCTTGGCTTG 
CGTCCACCTC 
CCAGCACGCT 
CGAGGTGGTT 
CCAGGTTGAG 
TGCCCGCCAG 
CGCGGGCCTC 
CGGGCTCCGC 
GTTCGGCGAG 
TCTGGGGGAG 
GGTCCACGCC 
TGCCGACGCG 
CGGGTGCCCG 
GAAGGAGCTT 
TCTGCATGGG 
CACGGCAGAC 
GCGCGAAAAA 
CCGCGTCACC 
GCTCCTCGGC 
CACGATAGCA 
CCACGCTTCG 
GTCGTTCGTC 
GAGACCGGGA 
CTCCTCAGCC 
CGCAATGGCG 
ACACCCGTGC 
GCCCGGGCAG 
GCCGCGCCTG 
TGCCGGGCGG 
CGCATTCCCC 
GAGGGCCGCC 
TCGCCGGCGC 
CCCGGCGGCG 
CGCCCTGCCG 
GCCGTCACCA 
GCTCCCGTGC 
CGCCCCTCGG 
GGTACGGCGG 



GTCGCCGACC 
GTGTCCGCCG 
GTGCGGGCCG 
TTCGACCACG 
CACGGAGCGC 
TCCGCCCTCG 
TGGAAGCAGG 
GCCGCCCGGC 
GGCCGGTGCC 
GCGGCGACCG 
CCGGTAGTGC 
CGGCTCGTCG 
CCTCAACTGC 
CCCGAATCCC 
GGCCGCCAGC 
GCGCTCCGGG 
CAGGGTGCTC 
GGTGAAGGAC 
CCCGCCGCGC 
GGGCCGCGGC 
CGCGGAGATC 
CCGCGGGCCG 
GTCCTCCAGC 
CAGGTATCCC 
CCACAGGGCG 
GCCGCGGTGG 
GGTGGGTTCG 
ACGCGCCTGC 
ACCGAGGCCG 
GAACGCCTCC 
CCCGGCCAGC 
GGCGTGGTCG 
GTAGCCGCAG 
GGCGAGCATG 
CTCGCCGAGT 
GACGACGTCG 
TTCCGGCGTG 
TGATCCGCCA 
GGCGGAAGAA 
AGGGAGGCGA 
GTCCTGCCGG 
GCCCGGACTC 
GAGCAGACGG 
GATTTTTTGC 
AGCGGGAGGT 
CCGAGAGCGA 
GGGAACGCCG 
CCAGGAAGGC 
CCCACAGCTC 
GTGTAGGGTG 
ACGGTGCGGC 
CTGTTTCGTG 
GCAAGGCCGG 
GCCGCGCCGG 
CGCTCCCGGC 
CGTGCCGCGC 
TCCCCGCGGC 
CCCCGGGGCG 
CTGCCGGAAG 
CCGGCCGGGC 
GTGGACCATC 



CGGCCCTCGT 
GCAGCTTCGT 
GGGAGCCCTG 
CCGACGAGCT 
ACCACTGCAT 
TCCGGCGCTT 
GCATGCTGAT 
CGCCCGCCGG 
CCTCGCCCGA 
GTGAACCGCT 
GCCAGCGCCT 
AGCAGCAGCA 
CCGGTGGACA 
AGGAGCGCGC 
AGGCTCTGCT 
CGCTCGCACT 
TTGCCCGCGC 
GGGACGTCGA 
CCGACCGTGC 
ACCGGGTTCT 
TGCTTCTCCA 
GTGGCCAGGT 
CAGTCCTGGT 
GCGTAACCGC 
GTGGCCACGC 
GCCCGCAGGC 
TCGAGCAGCA 
TCACCTCCGG 
TGCATCGCCG 
AGCAGGTCGC 
GCCTGCTCCG 
ATGGCGTCCT 
CCGCCGGGAG 
CGGAGCAGGG 
GCCACCGACT 
TCGAGGACGA 
CCGGTGAGCG 
TTCGGAGAAA 
GAGAACGCCT 
AGAAGCGAGC 
GAACCCTCGA 
CCGTGGCGGT 
AGCCGGCAGG 
TCGCGGGACG 
TCATATGCAG 
CGTCAAGCGG 
GGCGCACGAG 
GAATTTCCGC 
GACTCCGCTG 
GCGGGCATGT 
CCTGAACTCT 
CTGCCCGGTT 
CCTGAGGCCG 
TGGCGACGAC 
CCCGGCCGGC 
ACAGGCCGTA 
AACGTCGCCG 
CCGGCGTCTC 
GGCCGACTCA 
TACCGCCGCC 
TACCTCGCGC 



GCCCGCGGCG 

CCGCGTGGCC 

CGTCGTCCAC 

GGACTTCCAC 

CGGCGCCCAA 

CCCCACCCTC 

CCGCGGACTG 

GCACCGGCGC 

GGCCGCCTCA 

GCGCGAACAT 

CCTCCAGGTC 

GGTCCGCCGG 

GGTCTCCCAC 

CCCGGTGTTC 

GCCGGTCGGT 

CGCCCTGATC 

CGTTGTGCCC 

GCCGCGTGCC 

CGCCACCCTC 

CCTCCAGCCG 

CGTTGCGCTG 

GGTCGGCGGC 

AAGCCTGCTC 

CGCCGTGCCG 

GCTCCAGGAA 

GCTCCTCCAG 

TCAGCTGCGG 

AGAGGCTGCC 

CGTCGACACG 

CGTAGGCGCC 

CCTCACGCAA 

GAACGGTGTC 

CCCGGACGAG 

TCGACTTGCC 

GGTTGACGCC 

CCTGGAAGGA 

CCGCGGCGCC 

AAGAGGCAGT 

CGGCGAACGC 

CGGAGGCGTC 

CGGCGCCGGA 

ATCAGAGGAA 

GGGTCGCGAG 

ACGAGGCCGT 

GACAACCAGG 

AAGTTCCGGG 

GACGCTCGTT 

CGCAAGGCCG 

CGACAGGGGC 

ATCCAGGTGT 

CGTTTCGCGT 

CCGGAGCGAA 

CGACCGATAC 

CACCCCTTCC 

GGCGCCACCC 

CCGGCCGGCC 

GACACGGACA 

GCCGCTCTCG 

TGACCGAGCG 

TCTTCGCCGC 

TCCAGGCGCT 



GTGGAGGAGA 
ACCGAGGACG 
TTCGCGTCGG 
CGTGAGCGCA 
CTGGGCCGAC 
GATCTGGCCG 
GAACGCCAGA 
CACCAGGGCA 
CTCCACGAAG 
GCGGTCGTGG 
CTCCACGAGC 
GTCGCGCAGC 
CGCGGTGCCC 
CTCCGCGATG 
GATCTCCGTC 
GGGCGCCAGG 
CGTGATCAGG 
GACGGTGACC 
CACCCGGGCC 
GCGGACCCGC 
GTGGCGCTGG 
GCTGCGGGCC 
CCAGCGGCGC 
GTTGACGGTG 
GACCCGGTCG 
CCACTCCAGC 
GGACGCGGCC 
GAGCCGCCGG 
GGCGTCCGCC 
GAGCAGGCCC 
CCCCCGCTCC 
CTCCGGGGGC 
GACCTGGCCA 
CGATCCGTTC 
GTCCAACAGC 
ACCGGTCTCA 
GGTATCGGAA 
GTGGCCAAAA 
GGCGCACCCG 
GCGATCAGCG 
GCGGCAACCG 
GTAGTAACTG 
GTGCGATGGC 
GTGCGAACGT 
GTGGATCCGG 
AGGCACTGGA 
CCAAGGTGAA 
GGTGACACCG 
CTGCCCGCGC 
CGGTTCCCTG 
GCCCACCGTC 
CCTGTGGAGC 
ACGAGTTCAC 
GCACCGGCCC 
GGGTACGCCG 
GTTCGGCCGG 
CCGCCCCTCG 
CGCCGGCCCC 
ACACCTCCCC 
CATGGTCCTC 
GGAGCTC 



The above DNA sequence encodes the following 8,8a-deoxyoleandolide 
synthase proteins: 
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8,8a-deoxyoleandolide synthase 1 : 

1 MHVPGEENGH SIAIVGIACR LPGSATPQEF WRLLADSADA LDEPPAGRFP TGSLSSPPAP 
61 RGGFLDSIDT FDADFFNISP REAGVLDPQQ RLALELGWEA LEDAGIVPRH LRGTRTSVFM 
121 GAMWDDYAHL AHARGEAALT RHSLTGTHRG MIANRLSYAL GLQGPSLTVD TGQSSSLAAV 
181 HMACESLARG ESDLALVGGV NLVLDPAGTT GVERFGALSP DGRCYTFDSR ANGYARGEGG 
241 VVWLKPTHR ALADGDTVYC EILGSALNND GATEGLTVPS ARAQADVLRQ AWERARVAPT 
301 DVQYVELHGT GTPAGDPVEA EGLGTALGTA RPAEAFLLVG SVKTNIGHLE GAAGIAGLLK 
361 TVLSIKNRHL PASLNFTSPN PRIDLDALRL RVHTAYGPWP SPDRPLVAGV SSFGMGGTNC 
4 21 HVVLSELRNA GGDGAGKGPY TGTEDRLGAT EAEKRPDPAT GNGPDPAQDT HRYPPLILSA 
4 81 RSDAALRAQA ERLRHHLEHS PGQRLRDTAY SLATRRQVFE RHAWTGHDR EDLLNGLRDL 
541 ENGLPAPQVL LGRTPTPEPG GLAFLFSGQG SQQPGMGKRL HQVFPGFRDA LDEVCAELDT 
601 HLGRLLGPEA GPPLRDVMFA ERGTAHSALL SETHYTQAAL FALETALFRL LVQWGLKPDH 
661 LAGHSVGEIA AAHAAGILDL SDAAELVATR GALMRSLPGG GVMLSVQAPE SEVAPLLLGR 
721 EAHVGLAAVN GPDAWVSGE RGHVAAIEQI LRDRGRKSRY LRVSHAFHSP LME PVLEEFA 
781 EAVAGLTFRA PTTPLVSNLT GAPVDDRTMA TPAYWVRHVR EAVRFGDGIR ALGKLGTGSF 
841 LEVGPDGVLT AMARACVTAA PEPGHRGEQG ADADAHTALL LPALRRGRDE ARSLTEAVAR 
901 LHLHGVPMDW TSVLGGDVSR VPLPTYAFQR ESHWLPSGEA HPRPADDTES GTGRTEASPP 
961 RPHDVLHLVR SHAAAVLGHS RAER1DPDRA FRDLGFDSLT ALELRDRLDT ALGLRLPSSV 
1021 LFDHPSPGAL ARFLQGDDTR RPEPGKTNGT RATEPGPDPD DEPIAIVGMA CRFPGGVTSP 
1081 EDLWRLLAAG EDAVSGFPTD RGWNVTDSAT RRGGFLYDAG EFDAAFFGIS PREALVMDPQ 
1141 QRLLLETSWE ALERAGVSPG SLRGSDTAVY IGATAQDYGP RLHESDDDSG GYVLTGNTAS 
1201 VASGRIAYSL GLEGPAVTVD TACSSSLVAL HLAVQALRRG ECSLALAGGA TVMPSPGMFV 
1261 EFSRQGGLSE DGRCKAFAAT ADGTGWAEGV GVLLVERLSD ARRLGHRVLA WRGSAVNQD 
1321 GASNGLTAPN GPSQQRVIRA ALADAGLVPA DVDWEAHGT GTRLGDPIEA QALLATYGQG 
1381 RAGGRPVVLG SVKSNIGHTQ AAAGVAGVMK MVLALGRGW PKTLHVDEPS AHVDWSAGEV 
14 41 ELAVEAVPWS RGGRVRRAGV SSFGISGTNA HVIVEEAPAE PEPEPERGPG SWGWPWVV 
1501 SGRDAGALRE QAARLAAHVS GVSAVDVGWS LVATRSVFEH RAVMVGSELD AMAESLAGFA 
1561 AGGWPGWS GVAPAEGRRV VFVFPGQGSQ WVGMAAGLLD ACPVFAEAVA ECAAVLDPLT 
16^1 GWSLVEVLRG GGEAVLGRVD VVQPALWAVM VSLARTWRYY GVEPAAWGH SQGEIAAACV 
1681 AGGLSLADGA RWVLRSRAI ARIAGGGGMV SVSLPAGRVR TMLEEFDGRV SVAAVNGPSS 
1741 TWSGDVQAL DELLAGCERE GVRARRVPVD YASHSAQMDQ LRDDLLEALA TIVPTSANVP 
1801 FFSTVTADWL DTTALDAGYW FTNLRETVRF QEAVEGLVAQ GMGAFVECSP HPVLVPGITE 
1861 TLDTFDADAV ALSSLRRDEG GLDRFLTSLA EAFVQGVPVD WSRAFEGASP RTVDLPTYPF 
1921 QRQRYWLLDK AAQRERERLE DWRYHVEWRP VTTRPSARLS GVWAVAIPAR LARDSLLVGA 
1981 1DALERGGAR AVPWVDERD HDRQALVEAL RNGLGDDDLA GVLSLLALDE APHGDHPDVP 
2041 VGMAASLALV QAMADAAAEV PVWFATRGAV AALPGESPER PRQALLWGLG RWALEQPQI 
2101 WGGLVDLPQH LDEDAGRRLV DWGG LADED QLAVRASSVL ARRLVRTPGH RMSSQAGGRE 
2161 WSPSGTVLVT GGTGALGAHV ARWLAGKGAE HLVLISRRGA DAAGAAALRD SLTDMGVRVT 
2221 LAACDAADRH ALETLLDSLR TDPAQLTAVI HAAGALDDGM TTVLTPEQMN NALRAKVTAT 
2281 VNLHELTRDL DLSAFVLFSS ISATLGIPGQ ANYAPGNSFL DAFAEWRRAQ GLVATSIAWG 
2341 PWSGGTGMAH EGSVGERLQR HGVLAMEPAA AIAALDHTLA SDETAVAVAD IDWSRFFLAY 
2401 TALRARPLIG EIPEARRMLE SGSGPGDLEP DRAEPELAVR LAGLTAVEQE RLLVQLVREQ 
2461 AAWLGHSGA EAVAPDRAFK DLGFDSLTSV ELRNRLNTAT GLRLPVTAVF DYARPAALAG 
2521 HLRSRLIDDD GDHGALPGVE KHAIDEPIAI VGMACRFPGG IASPEDLWDV LTAGEDWSG 
2581 LPQNRGWDLG RLYDPDPDRA GTSYMREGAF LHEAGEFDAA FFGISPREAL AMDPQQRLLL 
2641 ETSWEALERA GITPSKLAGS PTGVFFGMSN QDYAAQAGDV PSELEGYLLT GSISSVASGR 
2701 VAYTFGLEGP AVTVDTACSS SLVALHLAVQ GLRRGECSLA LVGGVTVMSS PVTLTTFSRQ 
27 61 RGLSVDGRCK AFAASADGFG AAEGVGVLLV ERLSDARRLG HRVLAWRGS AVNQDGASNG 
. 2821 LAAPNGPSQQ RVIRAALADA GLAPADVDW EAHGTGTRLG DPIEAQALLA TYGQGRTSGR 
2881 PVWLGSVKSN IGHTQAAAGV AGVMKMVLAL GRGWPKTLH VDEPSPHVDW SAGEVELAVE 
2941 AVPWSRGGRV RRAGVSSFGI SGTNAHVIVE EAPAEPSVEE GPGSWGWP WWSGRDAGA 
3001 LRAQAARLAA HVSSTGAGW DVGWSLVATR SVFEHRAVMV GTDLDSMAGS LAGFAAGGVV 
3061 PGWSGVAPA EGRRWFVFP GQGSQWVGMA AGLLDACPVF AEAVAECAAV LDRLTGWSLV 
3121 EVLRGGEAVL GRVDVVQPAL WAVMVSLART WRYYGVEPAA WGHSQGEIA AACVAGGLSL 
3181 ADGARVWLR SRAIARIAGG GGMVSVGLSA ERVRTMLDTY GGRVSVAAVN GPSSTVVSGD 
3241 AQALDELLAG CEREGVRARR VPVDYASHSA QMDQLRDELL EALADVTPQD SSVPFFSTVT 
3301 ADWLDTTALD AGYWFTNLRE TVRFQEAVEG LVAQGMGAFV ECSPHPVLVP GITETLDTFD 
3361 ADAVALSSLR RDEGGLDRFL TSLAEAFVQG VPVDWTHAFE GGRPRFVDLP TYAFQRQRYW 
3421 LHEEPLQEPV DEAWDAE FWS WERGDATAV S DLLS T DAE A LHTVLPALSS WRRRRVEHRR 
34 81 LODWRYRVEW KPFPAALDEV LGGGWLFWP RGLADDGWA RWAAVTARG GEVSWELDP 
3541 TRPDRRAYAE AVAGRGVSGV VSFLSWDDRR HSEHSWPAG LAASLVLAQA LVDLGRVGEG 
3601 PRLWLVTRGA WAGPSDAGV VIDPVQAQVW GFGRVLGLEH PELWGGLVDL PVGVDEEVCR 
3661 RFVGWASAG FEDQVAVRGS GVWVRRLVRA WDGGGGGWR PRGTVLVTGG LGGLGAHTAR 
3721 WLVGGGADHV VLVSRRGGSA PGAGDLVREL EGLGGARVSV RACDVADRVA LRALLSDLGE 
3781 PVTAVFHAAG VPQSTPLAEI SVQEAADVMA AKVAGAVNLG ELVDPCGLEA FVLFSSNAGV 
3841 WGSGGQAVYA AANAFLDALA VRRRGVGLPA TSVAWGMWAG EGMASVGGAA RELSRRGVRA 
3901 MDPERAVAVM ADAVGRGEAF VAVADVDWER FVTGFASARP RPLISDLPEV RAWEGQVCG 
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3961 RGQGLGLVGE EESSGWLKRL SGLSRVRQEE ELVELVRAQA AVVLGHGSAQ DVPAERAFKE 

4021 LGFDSLTAVE LRNGLAAATG IRLPATMAFD HPTATAIARF LQSELVGSDD PLTLMRSAID 

4081 QLETGLALLE SDEEARSEIT KRLNILLPRF GSGGSSRGRE AGQDAGEHQD VEDATIDELF 
4141 EVLDNELGNS 

8,8a-deoxyoleandolide synthase 2: 

1 VTNDEKIVEY LKRATVDLRK ARHRIWELED EPIAITSMAC HFPGGIESPE QLWELLSAGG 
61 EVLSEFPDDR GWDLDEIYHP DPEHSGTSYV RHGGFLDHAT QFDTDFFGIS PREALAMDPQ 
121 QRLLLETSWQ LFERAGVDPH TLKGSRTGVF VGAAHMGYAD RVDTPPAEAE GYLLTGNASA 
181 WSGRISYTF GLEGPAVTVD TACSSSLVAL HLAVQALRRG ECSLAWGGV AVMSDPKVFV 
241 EFSRQRGLAR DGRSKAFAAS ADGFGFAEGV SLLLLERLSD ARRLGHRVLA VVRGSAVNQD 
301 GASNGLAAPN GPSQQRVIRA ALADAGLAPA DVDWEAHGT GTRLGDPIEA QALLATYGQG 
361 RTSGRPVWLG SVKSNIGHTQ AAAGVAGVMK MVLALERGW PKTLHVDEPS PHVDWSTGAV 
421 ELLTEERPWE PEAERLRRAG I SAFGVSGTN AHVIVEEAPA EPEPEPEPGT RWAAGDLVV 
481 PWWSGRDAG ALRAQAARLA AHVSSTGAGV VDVGWSLVAT RSVFEHRAVM VGTDLDSMAG 
541 SLAGFAAGGV VPGWSGVAP AEGRRWFVF PGQGSQWVGM AAGLLDACPV FAEAVAECAA 
601 VLDPLTGWSL VEVLRGGEAV LGRVDWQPA LWAVMVSLAR TWRYYGVEPA AWGHSQGEI 
661 AAACVAGGLS LADGARVWL RSRAIARIAG GGGMVSVSLP AGRVRTMLDT YGGRLSVAAV 
721 NGPSSTWSG DAQALDELLA GCEREGVRAR RVPVDYASHS AQMDQLRDEL LEALADITPQ 
781 HSSVPFFSTV TADWLDTTAL DAGYWFTNLR ETVRFQEAVE GLVAQGMGAF VECSPHPVLV 
841 PGIEQTLDTV EADAVALGSL RRDEGGLGRF LTSLAEAFVQ GVPVDWSRTF EGASPRTVDL 
901 PTYPFQRQRF WLEGSPALSS NGVEGEADVA FWDAVEREDS AWAEELGID AKALHMTLPA 
961 LSSWRRRERQ RRKVQRWRYR VEWKRLPNSR AQESLQGGWL LWPQGRAGD VRVTQSVAEV 
1021 AAKGGEATVL EVDALHPDRA AYAEALTRWP GVRGWSFLA WEEQALAEHP VLSAGLAASL 
1081 ALAQALIDVG GSGESAPRLW LVTEAAVVIG AADTGAVIDP VHAQLWGFGR VLALEHPELW 
1141 GGLIDLPAVA GEPGSITDHA HADLLATVLA TMVQAAARGE DQVAVRTTGT YVPRLVRSGG 
1201 SAHSGARRWQ PRDTVLVTGG MGPLTAHIVR WLADNGADQV VLLGGQGADG EAEALRAEFD 
1261 GHTTKIELAD VDTEDSDALR SLLDRTTGEH PLRAVIHAPT WEFASVAES DLVRFARTIS 
1321 SKIAGVEQLD EVLSGIDTAH DVVFFSSVAG VWGSAGQSAY AAGNAFLDAV AQHRRLRGLP 
1381 GTSVAWTPWD DDRSLASLGD SYLDRRGLRA LS IPG ALAS L QEVLDQDEVH AWADVDWER 
1441 FYAGFSAVRR TSFFDDVHDA HRPALSTAAT NDGQARDEDG GTELVRRLRP LTETEQQREL 
1501 VSLVQSEVAA VLGHSSTDAV QPQRAFREIG FDSLTAVQLR NRLTATTGMR LPTTLVFDYP 
1561 TTNGLAEYLR SELFGVSGAP ADLSWRNAD EEDDPWIVG MACRFPGGID TPEAFWKLLE 
1621 AGGDVISELP ANRGWDMERL LNPDPEAKGT SATRYGGFLY DAGEFDAAFF GISPREALAM 
1681 DPQQRLLLET VWELIESAGV APDSLHRSRT GTFIGSNGQF YAPLLWNSGG DLEGYQGVGN 
17 41 AGSVMSGRVA YSLGLEGPAV TVDTACSSSL VALHLAVQAL RRGECSLAIA GGVTVMSTPD 
1801 SFVEFSRQQG LSEDGRCKAF ASTADGFGLA EGVSALLVER LSDARRLGHR VLAWRGSAV 
1861 NQDGASNGLT APNGPSQQRV IRAALADAGL APADVDWEA HGTGTRLGDP IEAQALLATY 
1921 GQGRAGGRPV VLGSVKSNIG HTQAAAGVAG VMKMVLALER GVVPKTLHVD EPSPHVDWSA 
1981 GEVELAVEAV PWSRGGRVRR AGVSSFGISG TNAHVIVEEA PAEPEPEPGT RWAAGDLVV 
2041 PWWSGRDAG ALREQAARLA AHVSSTGAGV VDVGWSLVAT RSVFEHRAVM VGSELDSMAE 
2101 SLAGFAAGGV VPGWSGVAP AEGRRWFVF PGQGSQWVGM AAGLLDACPV FAEAVAECAA 
2161 VLDPVTGWSL VEVLRGGGEA VLGRVDWQP ALWAVMVSLA RTWRYYGVEP AAWGHSQGE 
2221 IAAACVAGGL SLADGARWV LRSRAIARIA GGGGMVSVGL SAERVRTMLD TYGGRVSVAA 
2281 VNGPSSTVVS GDVQALDELL AGCEREGVRA RRVPVDYASH SAQMDQLRDE LLEALADITP 
2341 QHSSVPFFST VTADWLDTTA LDAGYWFTNL RETVRFQEAV EGLVAQGMGA FVECSPHPVL 
2401 VPGIEQTLDA LDQNAAVLGS LRRDEGGLDR LLTSLAEAFV QGVPVDWTHA FEGMTPRTVD 
24 61 LPTYPFQRQH YWPKPAPAPG ANLGDVASVG LTAAGHPLLG AWEMPDSDG LVLTGQISLR 
2521 THPWLADHEV LGSVLLPGTA FVELAVQAAD RAGYDVLDEL TLEAPLVLPD RGGIQVRLAL 
2581 GPSEADGRRS LQLHSRPEEA AGFHRWTRHA SGFWPGGTG AARPTEPAGV WPPAGAEPVA 
2641 LASDRYARLV ERGYTYGPSF QGLHTAWRHG DDVYAEVALP EGTPADGYAL HPALLDAAVQ 
2701 AVGLGSFVED PGQVYLPFLW SDVTLHATGA TSLRVRVSPA GPDTVALALA DPAGAPVATV 
27 61 GALRLRTTSA AQLARARGSA EHAMFRVEWV EEGSAADRCR GGAGGTTYEG ERAAEAGAAA 
2821 GTWAVLGPRV PAAVRTMGVD WTALDTPDH PADPQSLADL AALGDTVPDV WVTSLLSLA 
2881 SGADSPLGNR PRPTAAEQDT AATVAGVHSA LHAALDLVQA WLADERHTAS RLVLVTRHAM 
2941 TVAESDPEPD LLLAPVWGLV RSAQAENPGR FVLADIDGDE ASWDALPRAV ASAASEVAIR 
3001 AGAVYVPRLA RATDEGLWA DEAAGPWRLD VTEAGTLANL ALVPCPDASR PLGPDEVRIA 
30 61 VRAAGVNFRD VLLALGMYPD EGLMGAEAAG WTEVGGGVT TLAPGDRVMG LVTGGFGPVA 
3121 VTHHRMLVRM PRGWSFAEAA SVPVAFLTAY YALHDLAGLR GGESVLVHSA AGGVGMAAVQ 
3181 LARHWDAEVF GTASKGKWDV LAAQGLDEEH IGSSRTTEFE QRFRATSGGR GIDWLNALS 
3241 GDFVDASARL LREGGRFVEM GKTDIRTDLG WGADGVPDI RYVAFDLAEA GAERIGQMLD 
3301 EIMALFDAGV LRLPPLRAWP VRRAHEALRF VSQARHVGKV VLTVPAALDA EGTVLITGAG 
3361 TLGALVARHL VTEHDVRRLL LVSRSGVAPD LAAELGALGA EVTVAACDVA NRKALKALLE 
3421 DIPPEHPVTG IVHTAGVLDD GWSGLTPER VDTVLKPKVD AALTLESVIG ELDLDPALFV 
3481 IFSSAASMLG GPGQGSYAAA NQFLDTLARH RARRGLTSVS LGWGLWHEAS GLTGGLADID 
3541 RDRMSRAGIA PMPTDEALHL FDRATELGDP VLLPMRLNEA ALEDRAADGT LPPLLSGLVR 
3601 VRHRPSARAG TATAAPATGP EAFARELAAA PDPRRALRDL VRGHVALVLG HSGPEAIDAE 
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3661 QAFRDIGFDS LTAVELRNRL NAETGLRLPG TLVFDYPNPS ALADHLLELL APATQPTAAP 
3721 LLAELERVEQ LLSAAASPGG PASAVDEETR TLIATRLATL ASQWTHLPVG SPGNADNRSG 
3781 PGESGQAQES GATGEHTAAW TSDDDLFAFL DKRLET 

8,8a-deoxyoleandolide synthase 3: 

1 VAEAEKLREY LWRATTELKE VSDRLRETEE RAREPIAIVG MSCRFPGGGD ATVNTPEQFW 
61 DLLNSGGDGI AGLPEDRGWD LGRLYDPDPD RAGTSYVREG GFLYDSGEFD AAFFGISPRE 
121 ALAMDPQQRL LLETSWEAFE SAGIKRAALR GSDTGVYIGA WSTGYAGSPY RLVEGLEGQL 
181 AIGTTLGAAS GRVAYTFGLE GPAVTVDTAC SSSLVALHLA VQGLRRGECS LALVGGVTVM 
241 SSPVTLTTFS RQRGLSVDGR CKAFPASADG FGAAEGVGVL LVERLSDARR LGHRVLAVVR 
301 GSAVNQDGAS NGLTAPNGPS QQRVIRAALA DAGLAPADVD WEAHGTGTR LGDPIEAQAL 
361 LATYGQGRAG GRPVWLGSVK SNIGHTQAAA GVAGVMKMVL ALGRGWPKT LHVDEPSPHV 
421 DWSAGAVELL TEERPWEPEA ERLRRAGISA FGVSGTNAHV IVEEAPAEPE PEPGTRWAA 
4 81 GDLWPWVVS GRDARALRAQ AARLAAHVSG VSAVDVGWSL VATRSVFEHR AVAIGSELDS 
541 MAGS LAG FAA GGVVPGVVSG VAPAEGRRW FVFPGQGSQW VGMAAGLLDA CPVFAEAVAE 
601 CAAVLDPVTG WSLVEVLQGR DATVLGRVDV VQPALWAVMV SLARTWRYYG VEPAAWGHS 
661 QGEIAAACVA GGLSLADGAR VWLRSRAIA RIAGGGGMVS VSLPAGRVRT MLEEFDGRLS 
721 VAAVNGPSST VVSGDVQALD ELLAGCEREG VRARRVPVDY ASHSAQMDQL RDELLEALAD 
781 ITPQDSSVPF FSTVTADWLG TTALGAGYWF TNLRETVRFQ EAVEGLVAQG MGAFVECSPH 
841 PVLVPGIEQT LDALDQNAAV FGSLRRDEGG LDRFLTSLAE AFVQGVPVDW SRAFEGVTPR 
901 TVDLPTYPFQ RQHYWLMAEE APVSQPPHSE NSFWSWADA DAEAAAELLG VDVEAVEAVM 
961 PALSSWHRQS QLRAEVNQWR YDVAWKRLTT GALPEKPGNW LWTPAGTDT TFAESLARTA 
1021 AAELGVSVSF AQVDTAHPDR SQYAHALRQA LTGPEKVDHL VSLLALDQAT DDLAAAPSCL 
1081 AASLVLAQAL VDLGRVGEGP RLWLVTRGAV VAGPSDAGAV IDPVQAQVWG FGRVLGLEHP 
1141 ELWGGLIDLP VGVDEEVCRR FVGWASAGF EDQVAVRGSG VWVRRLVRAV VDGGGGGWRP 
1201 RGTVLVTGGL GGLGAHTARW LVGGGADHVV LVSRRGGSAP GAGDLVRELE GLGGARVSVR 
1261 ACDVADRVAL RALLS DLGEP VTAVFHAAGV PQSTPLAEIS VQEAADVMAA KVAGAVNLGE 
1321 LVDPCGLEAF VLFSSNAGVW GSGGQAVYAA ANAFLDALAV RRRGVGLPAT SVAWGMWAGE 
1381 GMASVGGAAR ELSRRGVRAM DPERAVAVMA DAVGRGEAFV AVADVDWERF VTGFASARPR 
1441 PLISDLPEVR AWEGQVQGR GQGLGLVGEE ESSGWLKRLS GLSRVRQEEE LVELVRAQAA 
1501 WLGHGSAQD VPAERAFKEL GFDSLTAVEL RNGLAAATGI RLPATMAFDH PNATAIARFL 
1561 QSQLLPDAES ESAVPSSPED EVRQALASLS LDQLfcGAGLL DPLLALTRLR EINSTVQNPE 
1621 PTTESIDEMD GETCCAWRSA KSTAEPLTTG ADMPDPTAKY VEALRASLKE NERLRQQNHS 
1681 LLAASREAIA ITAMSCRFGG GIDSPEDLWR FLAEGRDAVA GLPEDRGWDL DALYHPDPEN 
1741 PGTTYVREGA FRYDAAQFDA GFFGISPREA LAMDPQQRLL LETSWELFER ADIDPYTVRG 
1801 TATGIFIGAG HQGYGPDPKR APESVAGYLL TGTASAVLSG RISYTFGLEG PAVTVDTACS 
18 61 SSLVALHLAV QALRRGECSL AIAGGVAVMS TPDAFVEFSR QQGMARDGRC KAFAAAADGM 
1921 GWGEGVSLLL LERLSDARRL GHRVLAWRG SAVNQDGASN GLAAPNGPSQ QRVIRAALAD 
1981 AGLAPADVDV VEAHGTGTRL GDPIEAQALL ATYGQGRAGG RPVWLGSVKS NIGHTQAAAG 
2041 VAGVMKMVLA LGRGWPKTL HVDEPSPHVD WSAGAVELLT EERPWEPEAE RLRRAGISAF 
2101 GVSGTNAHVI VEEAPAEPEP EPGTRWAAG DLWPWWSG RDVGALREQA ARLAAHVSST 
2161 GAGWDVGWS LVATRSVFEH RAVMVGTDLD SMAGSLAGFA AGGWPGWS GVAPAEGRRV 
2221 VFVFPGQGSQ WVGMAAGLLD ACPVFAEAVA ECAAVLDPVT GWSLVEVLQG RDATVLGRVD 
2281 WQPALWAVM VSLARTWRYY GVEPAAWGH SQGEIAAACV AGGLSLADGA RWVLRSRAI 
2341 ARIAGGGGMV SVSLPAGRVR TMLDTYGGRV SVAAVNGPSS TVVSGDVQAL DELLAGCERE 
2401 GVRARRVPVD YASHSAQMDQ LRDELLEALA DITPQDSSVP FFSTVTADWL DTTALDAGYW 
24 61 FTNLRETVRF QEAVEGLVAQ GMGAFVECSP HPVLVPGIEQ TLDALDQNAA VLGSLRRDEG 
2521 GLDRLLTSLA EAFVQGVPVD WTHAFEGVTP RTVDLPTYPF QRQRFWLDGS PASSANGVDG 
2581 EADAMIWDAV EREDSVAVAE ELGIDAEALH TVLPALSSWR RRRVEHRRLQ DWRYRVEWKP 
2641 FPAALDEVLG GGWLFWPRG LADDGWARV VAAVTARGGE VSWELDPTR PDRRAYAEAV 
2701 AGRGVSGWS FLSWDDRRHS EHPWPAGLA ASLVLAQALV DLGRVGEGPR LWLVTRDAW 
2761 AGPSDAGAVI DPVQAQVWGF GRVLGLEHPE LWGGLIDLPV EAPEPGSTCD HTYADLLATV 
2821 VASAGFEDQV AVRGSGVWVR RLVRAVVDGG GGGWRPRGTV LVTGGLGGLG AHTARWLVGG 
2881 GADHVVLVSR RGGSAPGAGD LVRELEGLGG ARVSVRACDV ADRVALRALL SDLGEPVTAV 
2941 FHAAGVPQST PLAEISVQEA ADVMAAKVAG AVNLGELVDP CGLEAFVLFS SNAGVWGSGG 
3001 QAVYAAANAF LDALAVRRRG VGLPATSVAW GMWAGEGMAS VGGAARELSR RGVRAMDPER 
3061 AVAVMADAVG RGEAFVAVAD VDWERFVTGF ASARPRPLIS DLPEVRTALR NQEQEQLHAP 
3121 VPEDRSAQLL RRLSMLSPAG REAELVKLVR TEAAAVLGHG SAQDVPAERA FKELGFDSLT 
3181 AVQLRNRLAA ATGTRLPASA VFDHPHAAAL ARWLLAGMRH ADGGHGGGHA GGPGPDADEG 
3241 RSAGAGHSGM LADLYRRSAE LGRSREFIGL LADTAAFRPV FHGPADLDAP LEAVPLADGV 
3301 RKPQLICCSG TAPVGGPHEF ARLASFFRGT RAVSALPLPG YLPGEQLPAD LDAVLAAQAE 
3361 AVEKQTGGAP FVLVGYSAGG LMAHALACHL AGRGTPPSGE VLVDVYPPGR QEPVFGWQKE 
3421 LTEGMFAQDF VPMDDTRLTA LGTYDRLMGE WRPAPSGLPT LLIRATEPMA EWTGAIDWRA 
34 81 SWEYDHTAVD MPGNHFTIMR EHAEDAARHI DVWLKGLTP 
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The recombinant DNA compounds of the invention that encode the 
oleandolide PKS proteins or portions thereof are useful in a variety of applications. 
While many of these applications relate to the heterologous expression of the 
oleandolide PKS or the construction of hybrid PKS enzymes, many useful 
5 applications involve the natural oleandomycin producer Streptomyces antibioticus. 

For example, one can use the recombinant DNA compounds of the invention 
to disrupt the oleAI, oleAII, or oleAIII genes by homologous recombination in 
Streptomyces antibioticus. The resulting host cell is a preferred host cell for making 
polyketides modified by oxidation, hydroxylation, and glycosylation in a manner 

1 0 similar to oleandomycin, because the genes that encode the proteins that perform 
these reactions are present in the host cell. Such a host cell also does not naturally 
produce any oleandomycin that could interfere with production or purification of the 
polyketide of interest. 

One illustrative recombinant host cell provided by the present invention 

15 expresses a recombinant oleandolide PKS in which the module 1 KS domain is 

inactivated by deletion or other mutation. In a preferred embodiment, the inactivation 
is mediated by a change in the KS domain that renders it incapable of binding 
substrate (the KS1° mutation). In a particularly preferred embodiment, this 
inactivation is rendered by a mutation in the codon for the active site cysteine that 

20 changes the codon to another codon, such as an alanine codon* Such constructs are 
especially useful when placed in translational reading frame with extender modules 1 
and 2 of an oleandolide or the corresponding modules of another PKS. The utility of 
these constructs is that host cells expressing, or cell free extracts containing, a PKS 
comprising the protein encoded thereby can be fed or supplied with N-acylcysteamine 

25 thioesters of precursor molecules to prepare a polyketide of interest. See U.S. patent 
application Serial No. 60/1 17,384, filed 27 Jan. 1999, and PCT patent publication No. 
US99/03986, both of which are incorporated herein by reference. Such KS1° 
constructs of the invention are useful in the production of 13-substituted- 
oleandomycin compounds in Streptomyces antibioticus host cells. Preferred 

30 compounds of the invention include those compounds in which the substituent at the 
13-position is propyl, vinyl, propargyl, other lower alkyl, and substituted alkyl. 

The compounds of the invention can also be used to construct recombinant 
host cells of the invention in which coding sequences for one or more domains or 
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modules of the oleandolide PKS have been deleted by homologous recombination 
with the Streptomyces antibioticus chromosomal DNA. Those of skill in the art will 
appreciate that such compounds are characterized by their homology with the 
chromosomal DNA and not by encoding a functional protein due to their intended 
5 function of deleting or otherwise altering portions of chromosomal DNA. For this and 
a variety of other applications, the compounds of the present invention include not 
only those DNA compounds that encode functional proteins but also those DNA 
compounds that are complementary or identical to any portion of the oleandolide PKS 
genes. 

1 0 . Thus, the invention provides a variety of modified Streptomyces antibioticus 
host cells in which one or more of the genes in the oleandolide PKS gene cluster have 
been mutated or disrupted. These cells are especially useful when it is desired to 
replace the disrupted function with a gene product expressed by a recombinant DNA 
expression vector. While such expression vectors of the invention are described in 

15 more detail in the following Section, those of skill in the art will appreciate that the 
vectors have application to S. antibioticus as well. Such S. antibioticus host cells can 
be preferred host cells for expressing oleandolide derivatives of the invention. 
Particularly preferred host cells of this type include those in which the coding 
sequence for the loading module has been mutated or disrupted, those in which one or 

20 more of any of the PKS gene ORFs has been mutated or disrupted, and/or those in 
which the genes for one or more oleandolide modification enzymes (glycosylation, 
epoxidation) have been mutated or disrupted. 

While the present invention provides many useful compounds having 
application to, and recombinant host cells derived from, Streptomyces antibioticus, 

25 many important applications of the present invention relate to the heterologous 
expression of all or a portion of the oleandolide PKS genes in cells other than S. 
antibioticus, as described in the following Section. 

Section II: Heterologous Expression of the Oleandolide PKS 
30 In one important embodiment, the invention provides methods for the 

heterologous expression of one or more of the oleandolide PKS genes and 
recombinant DNA expression vectors useful in the method. For purposes of the 
invention, any host cell other than Streptomyces antibioticus is a heterologous host 
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cell. Thus, included within the scope of the invention in addition to isolated nucleic 
acids encoding domains, modules, or proteins of the oleandolide PKS, are 
recombinant expression vectors that include such nucleic acids. The term expression 
vector refers to a nucleic acid that can be introduced into a host cell or cell- free 
5 transcription and translation system. An expression vector can be maintained 
permanently or transiently in a cell, whether as part of the chromosomal or other 
DNA in the cell or in any cellular compartment, such as a replicating vector in the 
cytoplasm. An expression vector also comprises a promoter that drives expression of 
an RNA, which is translated into a polypeptide in the cell or cell extract. For efficient 

10 translation of RNA into protein, the expression vector also typically contains a 

ribosome-binding site sequence positioned upstream of the start codon of the coding 
sequence of the gene to be expressed. Other elements, such as enhancers, secretion 
signal sequences, transcription termination sequences, and one or more marker genes 
by which host cells containing the vector can be identified and/or selected, may also 

15 be present in an expression vector. Selectable markers, i.e., genes that confer 

antibiotic resistance or sensitivity, are preferred and confer a selectable phenotype on 
transformed cells when the cells are grown in an appropriate selective medium. 

The various components of an expression vector can vary widely, depending 
on the intended use of the vector and especially the host cell(s) in which the vector is 

20 intended to replicate or drive expression. Expression vector components suitable for 
the expression of genes and maintenance of vectors in E. coli, yeast, Streptomyces, 
and other commonly used cells are widely known and commercially available. For 
example, suitable promoters for inclusion in the expression vectors of the invention 
include those that function in eucaryotic or procaryotic host cells. Promoters can 

25 comprise regulatory sequences that allQw for regulation of expression relative to the 
growth of the host cell or that cause the expression of a gene to be turned on or off in 
response to a chemical or physical stimulus. For E. coli and certain other bacterial 
host cells, promoters derived from genes for biosynthetic enzymes, antibiotic- 
resistance conferring enzymes, and phage proteins can be used and include, for 

30 example, the galactose, lactose {lac), maltose, tryptophan (trp\ beta-lactamase {bla\ 
bacteriophage lambda PL, and T5 promoters. In addition, synthetic promoters, such as 
the tac promoter (U.S. Patent No. 4,551,433), can also be used. 
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Thus, recombinant expression vectors contain at least one expression system, 
which, in turn, is composed of at least a portion of the oleandolide PKS coding 
sequences operably linked to a promoter and optionally termination sequences that 
operate to effect expression of the coding sequence in compatible host cells. The host 
cells are modified by transformation with the recombinant DNA expression vectors of 
the invention to contain the expression system sequences either as extrachromosomal 
elements or integrated into the chromosome. The resulting host cells of the invention 
are useful in methods to produce PKS and post-PKS tailoring (modification) enzymes 
as well as polyketides and antibiotics and other useful compounds derived therefrom. 

Preferred host cells for purposes of selecting vector components for expression 
vectors of the present invention include fungal host cells such as yeast and procaryotic 
host cells such as E. coli and Streptomyces, but mammalian cell cultures can also be 
used. In hosts such as yeasts, plants, or mammalian cells that ordinarily do not 
produce modular polyketide synthase enzymes, it may be necessary to provide, also 
typically by recombinant means, suitable holo-ACP synthases to convert the 
recombinant^ produced PKS to functionality. Provision of such enzymes is 
described, for example, in PCT publication Nos. WO 97/13845 and 98/27203, each of 
which is incorporated herein by reference. Particularly preferred host cells for 
purposes of the present invention are Streptomyces and Saccharopolyspora host cells, 
as discussed in greater detail below. 

In a preferred embodiment, the expression vectors of the invention are used to 
construct a heterologous recombinant Streptomyces host cell that expresses a 
recombinant PKS of the invention. Streptomyces is a convenient host for expressing 
polyketides, because polyketides are naturally produced in certain Streptomyces 
species, and Streptomyces cells generally produce the precursors needed to form the 
desired polyketide. Those of skill in the art will recognize that, if a Streptomyces host 
cell produces any portion of a PKS enzyme or produces a polyketide-modifying 
enzyme, the recombinant vector need drive expression of only those genes 
constituting the remainder of the desired PKS enzyme or other polyketide-modifying 
enzymes. Thus, such a vector may comprise only a single ORF, with the desired 
remainder of the polypeptides constituting the PKS provided by the genes on the host 
cell chromosomal DNA. If a Streptomyces or other host cell ordinarily produces 
polyketides, it may be desirable to modify the host so as to prevent the production of 
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endogenous polyketides prior to its use to express a recombinant PKS of the 
invention. Such modified hosts include S. coelicolor CH999 and similarly modified 
S. lividans described in U.S. Patent No. 5,672,491, and PCT publication Nos. WO 
95/08548 and WO 96/40968, incorporated herein by reference. In such hosts, it may 
5 not be necessary to provide enzymatic activities for all of the desired post- 

translational modifications of the enzymes that make up the recombinantly produced 
PKS, because the host naturally expresses such enzymes. In particular, these hosts 
generally contain holo-ACP synthases that provide the pantotheinyl residue needed 
for functionality of the PKS. 

10 The invention provides a wide variety of expression vectors for use in 

Streptomyces. The replicating expression vectors of the present invention include, for 
example and without limitation, those that comprise an origin of replication from a 
low copy number vector, such as SCP2* (see Hopwood et al, Genetic Manipulation 
ofStreptomyces: A Laboratory manual (The John Innes Foundation, Norwich, U.K., 

15 1985); Lydiate et al. 9 1985, Gene 35: 223-235; and Kieser and Melton, 1988, Gene 
65: 83-91, each of which is incorporated herein by reference), SLP1.2 (Thompson et 
aL, 1982, Gene 20: 51-62, incorporated herein by reference), and pSG5(ts) (Muth et 
al y 1989, Mol Gen. Genet 219: 341-348, and Biennan et a/., 1992, Gene 116: 43-49, 
each of which is incorporated herein by reference), or a high copy number vector, 

20 such as pIJlOl and pJVl (see Katz et ai, 1983, J. Gen. Microbiol 129: 2703-2714; 
Vara et a/., 1989, J. Bacteriol. 171: 5782-5781; and Servin-Gonzalez, 1993, Plasmid 
30: 131-140, each of which is incorporated herein by reference). High copy number 
vectors are generally, however, not preferred for expression of large genes or multiple 
genes. For non-replicating and integrating vectors and generally for any vector, it is 

25 useful to include at least an E. coli origin of replication, such as from pUC, plP, pi I, 
and pBR. For phage based vectors, the phage phiC31 and its derivative KC515 can be 
employed (see Hopwood et aL 9 supra). Also, plasmid pSET152, plasmid pSAM, 
plasmids pSElOl and pSE21 1, all of which integrate site-specifically in the 
chromosomal DNA of S. lividans, can be employed for purposes of the present 

30 invention. 

The Streptomyces recombinant expression vectors of the invention typically 
comprise one or more selectable markers, including antibiotic resistance conferring 
genes selected from the group consisting of the ermE (confers resistance to 
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erythromycin and lincomycin), tsr (confers resistance to thiostrepton), aadA (confers 
resistance to spectinomycin and streptomycin), aacC4 (confers resistance to 
apramycin, kanamycin, gentamicin, geneticin (G418), and neomycin), hyg (confers 
resistance to hygromycin), and vph (confers resistance to viomycin) resistance 
5 conferring genes. Alternatively, several polyketides are naturally colored, and this 
characteristic can provide a built-in marker for identifying cells. 

Preferred Streptomyces host cell/vector combinations of the invention include 
S. coelicolor CH999 and S. lividans K4-1 14 and K4-155 host cells, which have been 
modified so as not to produce the polyketide actinorhodin, and expression vectors 

10 derived from the pRMl and pRM5 vectors, as described in U.S. Patent No. 5,830,750 
and U.S. patent application Serial Nos. 08/828,898, filed 31 Mar. 1997, and 
09/181,833, filed 28 Oct. 1998, each of which is incorporated herein by reference. 
These vectors are particularly preferred in that they contain promoters compatible 
with numerous and diverse Streptomyces spp. Particularly useful promoters for 

1 5 Streptomyces host cells include those from PKS gene clusters that result in the 
production of polyketides as secondary metabolites, including promoters from 
aromatic (Type II) PKS gene clusters. Examples of Type II PKS gene cluster 
promoters are act gene promoters and tern gene promoters; examples of Type I PKS 
gene cluster promoter are the spiramycin PKS and DEBS genes promoter. The present 

20 invention also provides the oleandolide PKS gene promoter in recombinant form. The 
promoter for the oleA genes is located upstream of the oleAI gene on cosmid 
pKOS055-5 of the invention. This promoter is contained within an ~1 kb segment 
upstream of the oleAI coding sequence and can be used to drive expression of the 
oleandolide PKS or any other coding sequence of interest in host cells in which the 

25 promoter functions, particularly S. antibioticus and generally any Streptomyces 
species. 

As described above, particularly useful control sequences are those that alone 
or together with suitable regulatory systems activate expression during transition from 
growth to stationary phase in the vegetative mycelium. The promoter contained in the 
30 aforementioned plasmid pRM5, i.e., the actl/actlll promoter pair and the actII-ORF4 
activator gene, is particularly preferred. Other useful Streptomyces promoters include 
without limitation those from the ermE gene and the melCl gene, which act 
constitutively, and the tipA gene and the merA gene, which can be induced at any 
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growth stage. In addition, the T7 RNA polymerase system has been transferred to 
Streptomyces and can be employed in the vectors and host cells of the invention. In 
this system, the coding sequence for the T7 RNA polymerase is inserted into a neutral 
site of the chromosome or in a vector under the control of the inducible merA 
5 promoter, and the gene of interest is placed under the control of the T7 promoter. As 
noted above, one or more activator genes can also be employed to activate initiation 
of transcription at promoter sequences. Activator genes in addition to the actII-ORF4 
gene described above include dnrl y redD, and ptpA genes (see U.S. patent application 
Serial No. 09/181,833, supra). 

10 To provide a preferred host cell and vector for purposes of the invention, the 

oleandolide PKS genes were placed on a recombinant expression vector that was 
transferred to the non-macrolide producing host Streptomyces lividans K4-1 14, as 
described in Example 4. Transformation of S. lividans K4-1 14 (strain K4-155 can also 
be used) with this expression vector resulted in a strain which produced detectable 

15 amounts of 8,8a-deoxyoleandolide as determined by analysis of extracts by LC/MS. 

Moreover, and as noted in the preceding Section, the present invention also 
provides recombinant DNA compounds in which the encoded oleandolide module 1 
KS domain is inactivated or absent altogether. Example 4 below describes the 
introduction into Streptomyces lividans of a recombinant expression vector of the 

20 invention that encodes an oleandolide PKS with a KS 1 ° domain. The resulting host 
cells can be fed or supplied with N-acylcysteamine thioesters of precursor molecules 
to prepare oleandolide derivatives. Such cells of the invention are especially useful in 
the production of 13-substituted-6-deoxyerythronolide B compounds in recombinant 
host cells. Preferred compounds of the invention include those compounds in which 

25 the substituent at the 13-position is propyl, vinyl, propargyl, other lower alkyl, and 
substituted alkyl. The unmodified polyketides, called macrolide aglycones, produced 
in S. lividans K4-1 14 or K4-155 can be hydroxylated and glycosylated by adding 
them to the fermentation of a strain, such as, for example, S. antibioticus or 
Saccharopolyspora erythraea, that contains the requisite modification enzymes. 

30 There are a wide variety of diverse organisms that can modify macrolide 

aglycones to provide compounds with, or that can be readily modified to have, useful 
activities. For example, Saccharopolyspora erythraea can convert 6-dEB and 
oleandolide to a variety of useful compounds. The erythronolide 6-dEB is converted 
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by the eryF gene product to erythronolide B, which is, in turn, glycosylated by the 
eryB gene product to obtain 3-O-mycarosylerythronolide B, which contains L- 
mycarose at C-3. The enzyme eryC gene product then converts this compound to 
erythromycin D by glycosylation with D-desosamine at C-5. Erythromycin D, 
5 therefore, differs from 6-dEB through glycosylation and by the addition of a hydroxyl 
group at C-6. Erythromycin D can be converted to erythromycin B in a reaction 
catalyzed by the eryG gene product by methylating the L-mycarose residue at C-3. 
Erythromcyin D is converted to erythromycin C by the addition of a hydroxyl group 
at C- 12 in a reaction catalyzed by the eryK gene product. Erythromycin A is obtained 

10 from erythromycin C by methylation of the mycarose residue in a reaction catalyzed 
by the eryG gene product. 

The unmodified oleandolide compounds provided by the present invention, 
such as, for example, the oleandolide produced in Streptomyces lividans, can be 
provided to cultures of Saccharopolyspora erythraea and converted to the 

15 corresponding derivatives of erythromycins A, B, C, and D in accordance with the 
procedure provided in Example 6, below. To ensure that only the desired compound is 
produced, one can use an 5. erythraea eryA mutant that is unable to produce 6-dEB 
but can still carry out the desired conversions (Weber ei a/., 1985, J. BacterioL 
164(1): 425-433). Also, one can employ other mutant strains, such as eryB, eryC 9 

20 eryG, and/or eryK mutants, or mutant strains having mutations in multiple genes, to 
accumulate a preferred compound. The conversion can also be carried out in large 
fermentors for commercial production. 

Moreover, there are other useful organisms that can be employed to 
hydroxylate and/or glycosylate the compounds of the invention. As described above, 

25 the organisms can be mutants unable to produce the polyketide normally produced in 
that organism, the fermentation can be carried out on plates or in large fermentors, 
and the compounds produced can be chemically altered after fermentation. Thus, 
Streptomyces venezuelae, which produces picromycin, contains enzymes that can 
transfer a desosaminyl group to the C-5 hydroxyl and a hydroxyl group to the C- 12 

30 position. In addition, 5. venezuelae contains a glucosylation activity that glucosylates 
the T -hydroxyl group of the desosamine sugar. This latter modification reduces 
antibiotic activity, but the glucosyl residue is removed by cellular enzymatic action. 
Another organism, S. narbonensis, contains the same modification enzymes as & 
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venezuelae, except the C-12 hydroxylase. Thus, the present invention provides the 
compounds produced by hydroxylation and glycosylation of the macrolide aglycones 
of the invention by action of the enzymes endogenous to S. narbonensis and S. 
Venezuelan 

5 Other organisms suitable for making compounds of the invention include 

Streptomyces antibioticus (discussed in the preceding Section), Micromonospora 
megalomicea, S.fradiae, and S. thermotolerans. M. megalomicea produces 
megalomicin and contains enzymes that hydroxylate the C-6 and C-12 positions and 
glycosylate the C-3 hydroxy 1 with mycarose, the C-5 hydroxy 1 with desosamine, and 

10 the C-6 hydroxy 1 with megosamine (also known as rhodosamine), as well as acylating 
various positions. In addition to antibiotic activity, compounds of the invention 
produced by treatment with M. megalomicea enzymes can have antiparasitic activity 
as well. S.fradiae contains enzymes that glycosylate the C-5 hydroxy! with 
mycaminose and then the 4*-hydroxyl of mycaminose with mycarose, forming a 

15 disaccharide. S. thermotolerans contains the same activities as well as acylation 
activities. Thus, the present invention provides the compounds produced by 
hydroxylation and glycosylation of the macrolide aglycones of the invention by action 
of the enzymes endogenous to S. antibioticus , M. megalomicea, S.fradiae, and S. 
thermotolerans. 

20 The present invention also provides methods and genetic constructs for 

producing the glycosylated and/or hydroxylated compounds of the invention directly 
in the host cell of interest. Thus, the recombinant genes of the invention, which 
include recombinant oleAI, oleAII, and oleAIII genes with one or more deletions 
and/or insertions, including replacements of an oleA gene fragment with a gene 

25 fragment from a heterologous PKS gene (as discussed in the next Section), can be 
included on expression vectors suitable for expression of the encoded gene products 
in Saccharopolyspora erythraea, Streptomyces antibioticus* S. venezuelae, S. 
narbonensis, Micromonospora megalomicea, S.fradiae, and S. thermotolerans. A 
number of erythromycin high-producing strains of S. erythraea have been developed, 

30 and in a preferred embodiment, the oleandolide PKS genes are introduced into such 
strains (or erythromycin non-producing mutants thereof) to provide the corresponding 
modified oleandolide compounds in high yields. 
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Moreover, additional recombinant gene products can be expressed in the host 
cell to improve production of a desired polyketide. As but one non-limiting example, 
certain recombinant PKS proteins of the invention may produce a polyketide other 
than or in addition to the predicted polyketide, because the polyketide is cleaved from 
the PKS by the thioesterase (TE) domain in module 6 prior to processing by other 
domains on the PKS, in particular, any KR, DH, and/or ER domains in module 6. The 
production of the predicted polyketide can be increased in such instances by deleting 
the TE domain coding sequences from the gene and, optionally, expressing the TE 
domain as a separate protein. See Gokhale et aU Feb. 1999, "Mechanism and 
specificity of the terminal thioesterase domain from the erythromycin polyketide 
synthase" Chem. & Biol 6: 1 17-125, incorporated herein by reference. 

Thus, in one important aspect, the present invention provides methods, 
expression vectors, and recombinant host cells that enable the production of 
oleandolide and hydroxylated and glycosylated derivatives of oleandolide in 
heterologous host cells. The present invention also provides methods for making a 
wide variety of polyketides derived in part from the oleandolide PKS, as described in 
the following Section. 

-7* 

Section III: Hybrid PKS Genes 

The present invention provides recombinant DNA compounds encoding each 
of the domains of each of the modules of the oleandolide PKS. The availability of 
these compounds permits their use in recombinant procedures for production of 
desired portions of the oleandolide PKS fused to or expressed in conjunction with all 
or a portion of a heterologous PKS. The resulting hybrid PKS can then be expressed 
in a host cell to produce a desired polyketide. 

Thus, in accordance with the methods of the invention, a portion of the 
oleandolide PKS coding sequence that encodes a particular activity can be isolated 
and manipulated, for example, to replace the corresponding region in a different 
modular PKS. In addition, coding sequences for individual modules of the PKS can 
be ligated into suitable expression systems and used to produce the portion of the 
protein encoded. The resulting protein can be isolated and purified or can may be 
employed in situ to effect polyketide synthesis. Depending on the host for the 
recombinant production of the domain, module, protein, or combination of proteins, 
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suitable control sequences such as promoters, termination sequences, enhancers, and 
the like are ligated to the nucleotide sequence encoding the desired protein in the 
construction of the expression vector, as described in the preceding Section. 
In one important embodiment, the invention thus provides hybrid PKS 
5 enzymes and the corresponding recombinant DNA compounds that encode those 
hybrid PKS enzymes. For purposes of the invention, a hybrid PKS is a recombinant 
PKS that comprises all or part of one or more extender modules, loading module, 
and/or thioesterase/cyclase domain of a first PKS and all or part of one or more 
extender modules, loading module, and/or thioesterase/cyclase domain of a second 

1 0 PKS. In one preferred embodiment, the first PKS is most but not all of the oleandolide 
PKS, and the second PKS is only a portion or all of a non-oleandolide PKS. An 
illustrative example of such a hybrid PKS includes an oleandolide PKS in which the 
oleandolide PKS loading module has been replaced with a loading module of another 
PKS. Another example of such a hybrid PKS is an oleandolide PKS in which the AT 

1 5 domain of extender module 3 is replaced with an AT domain that binds only malonyl 
Co A. In another preferred embodiment, the first PKS is most but not all of a non- 
oleandolide PKS, and the second PKS is only a portion or all of the oleandolide PKS. 

t 

An illustrative example of such a hybrid PKS includes a rapamycin PKS in which an 

«*r* 

AT specific for malonyl CoA is replaced with the AT from the oleandolide PKS 
20 specific for methylmalonyl CoA. Other illustrative hybrid PKSs of the invention are 
described below. 

* 

Those of skill in the art will recognize that all or part of either the first or 
second PKS in a hybrid PKS of the invention need not be isolated from a naturally 
occurring source. For example, only a small portion of an AT domain determines its 

25 specificity. See PCT patent application No. WO US99/1 5047, and Lau et al. 9 infra, 
incorporated herein by reference. The state of the art in DNA synthesis allows the 
artisan to construct de novo DNA compounds of size sufficient to construct a useful 
portion of a PKS module or domain. Thus, the desired derivative coding sequences 
can be synthesized using standard solid phase synthesis methods such as those 

30 described by Jaye et aL y 1984, J. Biol Chem. 259: 633 1, and instruments for 
automated synthesis are available commercially from, for example, Applied 
Biosystems, Inc. For purposes of the invention, such synthetic DNA compounds are 
deemed to be a portion of a PKS. 
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With this general background regarding hybrid PKSs of the invention, one can 
better appreciate the benefit provided by the DNA compounds of the invention that 
encode the individual domains, modules, and proteins that comprise the oleandolide 
PKS. As described above, the oleandolide PKS is comprised of a loading module, six 
extender modules composed of a KS, AT, ACP, and KR, DH, and ER domains, and a 
thioesterase domain. The DNA compounds of the invention that encode these 
domains individually or in combination are useful in the construction of the hybrid 
PKS encoding DNA compounds of the invention. 

The recombinant DNA compounds of the invention that encode the loading 
module of the oleandolide PKS and the corresponding polypeptides encoded thereby 
are useful for a variety of applications. In one embodiment, a DNA compound 
comprising a sequence that encodes the oleandolide PKS loading module is inserted 
into a DNA compound that comprises the coding sequence for a heterologous PKS 
protein or portion thereof. The resulting construct, in which the coding sequence for 
the loading module of the heterologous PKS is replaced by that for the coding 
sequence of the oleandolide PKS loading module provides a novel PKS. Examples 
include the 6-deoxyerythronolide B, rapamycin, FK-506, FK-520, rifamycin, and 
avermectin PKS protein coding sequences. In another embodiment, a DNA compound 
comprising a sequence that encodes the oleandolide PKS loading module is inserted 
into a DNA compound that comprises the coding sequence for the oleandolide PKS or 
a recombinant oleandolide PKS that produces an oleandolide derivative. 

In another embodiment, a portion of the loading module coding sequence is 
utilized in conjuction with a heterologous coding sequence. In this embodiment, the 
invention provides, for example, replacing the malonyl CoA (acetyl CoA) specific AT 
with a propionyl CoA (methylmalonyl), butyryl CoA (ethylmalonyl), or other CoA 
specific AT. In addition, the KS Q and/or ACP can be replaced by another inactivated 
KS and/or another ACP. Alternatively, the KS Q and AT of the loading module can be 
replaced by an AT of a loading module such as that of DEBS. The resulting 
heterologous loading module coding sequence can be utilized in conjunction with a 
coding sequence for a PKS that synthesizes oleandolide, an oleandolide derivative, or 
another polyketide. 

The recombinant DNA compounds of the invention that encode the first 
extender module of the oleandolide PKS and the corresponding polypeptides encoded 
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thereby are useful for a variety of applications. In one embodiment, a DNA compound 
comprising a sequence that encodes the oleandolide PKS first extender module is 
inserted into a DNA compound that comprises the coding sequence for a heterologous 
PKS. The resulting construct, in which the coding sequence for a module of the 
5 heterologous PKS is either replaced by that for the first extender module of the 

oleandolide PKS or the latter is merely added to coding sequences for modules of the 
heterologous PKS, provides a novel PKS coding sequence. In another embodiment, a 
DNA compound comprising a sequence that encodes the first extender module of the 
oleandolide PKS is inserted into a DNA compound that comprises coding sequences 

10 for the oleandolide PKS or a recombinant oleandolide PKS that produces an 
oleandolide derivative. 

In another embodiment, a portion or all of the first extender module coding 
sequence is utilized in conjunction with other PKS coding sequences to create a 
hybrid module. In this embodiment, the invention provides, for example, replacing the 

1 5 methylmalonyl CoA specific AT with a malonyl CoA, ethylmalonyl Co A, or 2- 
hydroxymalonyl CoA specific AT; deleting (which includes inactivating) the KR; 
inserting a DH or a DH and ER; and/or replacing the KR with another KR, a DH and 
KR, or a DH, KR, and ER. In addition, the KS and/or ACP can be replaced with 
another KS and/or ACP. In each of these replacements or insertions, the heterologous 

20 KS, AT, DH, KR, ER, or ACP coding sequence can originate from a coding sequence 
for another module of the oleandolide PKS, from a gene for a PKS that produces a 
polyketide other than oleandolide, or from chemical synthesis. The resulting 
heterologous first extender module coding sequence can be utilized in conjunction 
with a coding sequence for a PKS that synthesizes oleandolide, an oleandolide 

25 derivative, or another polyketide. 

Those of skill in the art will recognize, however, that deletion of the KR 
domain of module 1 or insertion of a DH domain or DH and KR domains into module 
1 will prevent the typical cyclization of the polyketide at the hydroxyl group created 
by the KR if such hybrid module is employed as a first extender module in a hybrid 

30 PKS or is otherwise involved in producing a portion of the polyketide at which 

cyclization is to occur. Such deletions or insertions can be useful, however, to create 
linear molecules or to induce cyclization at another site in the molecule. 



WO 00/26349 - 43 . PCTAJS99/24478 

As noted above, the invention also provides recombinant PKSs and 
recombinant DNA compounds and vectors that encode a PKS protein in which the KS 
domain of the first extender module has been inactivated. Such constructs are 
especially useful when placed in translational reading frame with the remaining 
modules and domains of an oleandolide or oieandolide derivative PKS, a hybrid PKS, 
or a heterologous PKS. The utility of these constructs is that host cells expressing, or 
cell free extracts containing, the PKS encoded thereby can be fed or supplied with N- 
acylcysteamine thioesters of precursor molecules to prepare oleandolide derivative 
compounds. See U.S. patent application Serial No. 60/1 17,384, filed 27 Jan. 1999, 
and PCT publication Nos. WO 99/03986 and 97/02358, each of which is incorporated 
herein by reference. 

The recombinant DNA compounds of the invention that encode the second 

* 

extender module of the oleandolide PKS and the corresponding polypeptides encoded 
thereby are useful for a variety of applications. In one embodiment, a DNA compound 
comprising a sequence that encodes the oleandolide PKS second extender module is 

m * 

inserted into a DNA compound that comprises the coding sequence for a heterologous 
PKS. The resulting construct, in which the coding sequence for a module of the 
heterologous PKS is either replaced by that for the second extender module of the 
oleandolide PKS or the latter is merely added to coding sequences for the modules of 
the heterologous PKS, provides a novel PKS. In another embodiment, a DNA 

» ■ 

compound comprising a sequence that encodes the second extender module of the 
oleandolide PKS is inserted into a DNA compound that comprises the coding 
sequences for the oleandolide PKS or a recombinant oleandolide PKS that produces 
an oleandolide derivative. 

In another embodiment, a portion or all of the second extender module coding 
sequence is utilized in conjunction with other PKS coding sequences to create a 
hybrid module. In this embodiment, the invention provides, for example, replacing the 
methylmalonyl CoA specific AT with a malonyl CoA, ethylmalonyl CoA, or 2- 
hydroxymalonyl CoA specific AT; deleting (or inactivating) the KR; replacing the 
KR with a KR, a KR and a DH, or a KR, DH, and ER; and/or inserting a DH or a DH 
and an ER. In addition, the KS and/or ACP can be replaced with another KS and/or 
ACP. In each of these replacements or insertions, the heterologous KS, AT, DH, KR, 
ER, or ACP coding sequence can originate from a coding sequence for another 
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module of the oleandolide PKS, from a coding sequence for a PKS that produces a 
polyketide other than oleandolide, or from chemical synthesis. The resulting 
heterologous second extender module coding sequence can be utilized in conjunction 
with a coding sequence from a PKS that synthesizes oleandolide, an oleandolide 
5 derivative, or another polyketide. 

The recombinant DNA compounds of the invention that encode the third 
extender module of the oleandolide PKS and the corresponding polypeptides encoded 
thereby are useful for a variety of applications. In one embodiment, a DNA compound 
comprising a sequence that encodes the oleandolide PKS third extender module is 

1 0 inserted into a DNA compound that comprises the coding sequence for a heterologous 
PKS. The resulting construct, in which the coding sequence for a module of the 
heterologous PKS is either replaced by that for the third extender module of the 
oleandolide PKS or the latter is merely added to coding sequences for the modules of 
the heterologous PKS, provides a novel PKS. In another embodiment, a DNA 

15 compound comprising a sequence that encodes the third extender module of the 

oleandolide PKS is inserted into a DNA compound that comprises coding sequences 
for the oleandolide PKS or a recombinant oleandolide PKS that produces an 
oleandolide derivative. 

1 

In another embodiment, a portion or all of the third extender module coding 

V 

20 sequence is utilized in conjunction with other PKS coding sequences to create a 

hybrid module. In this embodiment, the invention provides, for example, replacing the 
methylmalonyl CoA specific AT with a malonyl CoA, ethylmalonyl CoA, or 2- 
hydroxymalonyl CoA specific AT; deleting the inactive KR; and/or replacing the KR 
with an active KR, or a KR and DH, or a KR, DH, and ER. In addition, the KS and/or 

25 ACP can be replaced with another KS and/or ACP. In each of these replacements or 
insertions, the heterologous KS, AT, DH, KR, ER, or ACP coding sequence can 
originate from a coding sequence for another module of the oleandolide PKS, from a 
gene for a PKS that produces a polyketide other than oleandolide, or from chemical 
synthesis. The resulting heterologous third extender module coding sequence can be 

30 utilized in conjunction with a coding sequence for a PKS that synthesizes oleandolide, 
an oleandolide derivative, or another polyketide. 

The recombinant DNA compounds of the invention that encode the fourth 
extender module of the oleandolide PKS and the corresponding polypeptides encoded 
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thereby are useful for a variety of applications. In one embodiment, a DNA compound 

comprising a sequence that encodes the oleandolide PKS fourth extender module is 

inserted into a DNA compound that comprises the coding sequence for a heterologous 

PKS. The resulting construct, in which the coding sequence for a module of the 

5 heterologous PKS is either replaced by that for the fourth extender module of the 

oleandolide PKS or the latter is merely added to coding sequences for the modules of 

the heterologous PKS, provides a novel PKS. In another embodiment, a DNA 

compound comprising a sequence that encodes the fourth extender module of the 

oleandolide PKS is inserted into a DNA compound that comprises coding sequences 

1 0 for the oleandolide PKS or a recombinant oleandolide PKS that produces an 

oleandolide derivative, 

In another embodiment, a portion of the fourth extender module coding 

sequence is utilized in conjunction with other PKS coding sequences to create a 

hybrid module. In this embodiment, the invention provides, for example, replacing the 

1 5 methylmalonyl CoA specific AT with a malonyl CoA, ethylmalonyl CoA, or 2- 

- t 

hydroxymalonyl CoA specific AT; deleting or inactivating any one, two, or all three 

t 

of the ER, DH, and KR; and/or replacing any one, two, or all three of the ER, DH, and 
KR with either a KR, a DH and KR, or a KR, DH, and ER. In addition, the KS and/or 

ACP can be replaced with another KS and/or ACP. In each of these replacements or 

* ■ - 

20 insertions, the heterologous KS, AT, DH, KR, ER, or ACP coding sequence can 

originate from a coding sequence for another module of the oleandolide PKS (except 
for the DH and ER domains), from a coding sequence for a PKS that produces a 
polyketide other than oleandolide, or from chemical synthesis. The resulting 
heterologous fourth extender module coding sequence can be utilized in conjunction 

25 with a coding sequence for a PKS that synthesizes oleandolide, an oleandolide 
derivative, or another polyketide. 

The recombinant DNA compounds of the invention that encode the fifth 
extender module of the oleandolide PKS and the corresponding polypeptides encoded 
thereby are useful for a variety of applications. In one embodiment, a DNA compound 

30 comprising a sequence that encodes the oleandolide PKS fifth extender module is 

inserted into a DNA compound that comprises the coding sequence for a heterologous 
PKS. The resulting construct, in which the coding sequence for a module of the 
heterologous PKS is either replaced by that for the fifth extender module of the 
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oleandolide PKS or the latter is merely added to coding sequences for the modules of 
the heterologous PKS, provides a novel PKS. In another embodiment, a DNA 
compound comprising a sequence that encodes the fifth extender module of the 
oleandolide PKS is inserted into a DNA compound that comprises the coding 
5 sequence for the oleandolide PKS or a recombinant oleandolide PKS that produces an 
oleandolide derivative. 

In another embodiment, a portion or all of the fifth extender module coding 
sequence is utilized in conjunction with other PKS coding sequences to create a 
hybrid module. In this embodiment, the invention provides, for example, replacing the 

> i 

10 methylmalonyl CoA specific AT with a malonyl Co A, ethylmalonyl CoA, or 2- 

hydroxymalonyl CoA specific AT; deleting (or inactivating) the KR; inserting a DH 
or a DH and ER; and/or replacing the KR with another KR, a DH and KR, or a DH, 
KR, and ER. In addition, the KS and/or ACP can be replaced with another KS and/or 
ACP. In each of these replacements or insertions, the heterologous KS, AT, DH, KR, 

15 ER, or ACP coding sequence can originate from a coding sequence for another 

• . .* . .* 

module of the oleandolide PKS, from a codii^ sequence for a PKS that produces a 

polyketide other than oleandolide, or from chemical synthesis. The resulting 

■* 

heterologous fifth extender module coding sequence can be utilized in conjunction 
with a coding sequence for a PKS that synthesizes oleandolide, an oleandolide 

20 derivative, or another polyketide. 

The recombinant DNA compounds of the invention that encode the sixth 
extender module of the oleandolide PKS and the corresponding polypeptides encoded 
thereby are useful for a variety of applications. In one embodiment, a DNA compound 
comprising a sequence that encodes the oleandolide PKS sixth extender module is 

25 inserted into a DNA compound that comprises the coding sequence for a heterologous 
PKS. The resulting construct, in which the coding sequence for a module of the 
heterologous PKS is either replaced by that for the sixth extender module of the 
oleandolide PKS or the latter is merely added to coding sequences for the modules of 
the heterologous PKS, provides a novel PKS. In another embodiment, a DNA 

30 compound comprising a sequence that encodes the sixth extender module of the 
oleandolide PKS is inserted into a DNA compound that comprises the coding 
sequences for the oleandolide PKS or a recombinant oleandolide PKS that produces 
an oleandolide derivative. 
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In another embodiment, a portion or all of the sixth extender module coding 

sequence is utilized in conjunction with other PKS coding sequences to create a 

hybrid module. In this embodiment, the invention provides, for example, replacing the 

methylmalonyl CoA specific AT with a malonyl CoA, ethylmalonyl Co A, or 2- 

5 hydroxymalonyl CoA specific AT; deleting or inactivating the KR or replacing the 

KR with another KR, a KR and DH, or a KR, DH, and an ER; and/or inserting a DH 

or a DH and ER. In addition, the KS and/or ACP can be replaced with another KS 

and/or ACP. In each of these replacements or insertions, the heterologous KS, AT, 

DH, KR, ER, or ACP coding sequence can originate from a coding sequence for 

10 another module of the oleandolide PKS, from a coding sequence for a PKS that 

produces a polyketide other than oleandolide, or from chemical synthesis. The 

resulting heterologous sixth extender module coding sequence can be utilized in 

conjunction with a coding sequence for a PKS that synthesizes oleandolide, an 

* 

oleandolide derivative, or another polyketide. 

15 The sixth extender module of the oleandolide PKS is followed by a 

thioesterase domain. This domain is important in the cyclization of the polyketide and 

its cleavage from the PKS. The present invention provides recombinant DNA 

compounds that encode hybrid PKS enzymes in which the oleandolide PKS is fused 

to a heterologous thioesterase or a heterologous PKS is fused to the oleandolide 

20 synthase thioesterase. Thus, for example, a thioesterase domain coding sequence from 

t 

another PKS gene can be inserted at the end of the sixth (or other final) extender 

module coding sequence in recombinant DNA compounds of the invention or the 

oleandolide PKS thioesterase can be similarly fused to a heterologous PKS. 

Recombinant DNA compounds encoding this thioesterase domain are useful in 
25 constructing DNA compounds that encode the oleandolide PKS, a PKS that produces 

an oleandolide derivative, and a PKS that produces a polyketide other than 

oleandolide or an oleandolide derivative. 

Thus, the hybrid modules of the invention are incorporated into a PKS to 

provide a hybrid PKS of the invention. A hybrid PKS of the invention can result not 
30 only: 

(i) from fusions of heterologous domain (where heterologous means the 
domains in that module are from at least two different naturally occurring modules) 
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coding sequences to produce a hybrid module coding sequence contained in a PKS 
gene whose product is incorporated into a PKS, 
but also: 

(ii) from fusions of heterologous module (where heterologous module means 
5 two modules are adjacent to one another that are not adjacent to one another in 

naturally occurring PKS enzymes) coding sequences to produce a hybrid coding 
sequence contained in a PKS gene whose product is incorporated into a PKS, 

(iii) from expression of one or more oleandolide PKS genes with one or more 
non-oleandolide PKS genes, including both naturally occurring and recombinant non- 

10 oleandolide PKS genes, and 

* * * 

(iv) from combinations of the foregoing. 

v* 

Various hybrid PKSs of the invention illustrating these various alternatives are 
described herein. 

An example of a hybrid PKS comprising fused modules results from fusion of 

15 the loading module of either DEBS or the narbonolide PKS (see PCT patent 
application No. US99/1 1814, incorporated herein by reference) with extender 
modules 1 and 2 of the oleandolide PKS to produce a hybrid oleAI gene. Co- 
expression of either one of these two hybrid oleAI genes with the oleAIIand oleAIII 
genes in suitable host cells, such as Streptqmcyes lividans, results in expression of a 

20 hybrid PICS of the invention that produces 6-deoxyerythronolide B in recombinant 
host cells. Co-expression of either one of these two hybrid oleAI genes with the eryAII 
and eryAIlI genes similarly results in the production of 6-dEB, while co-expression 
with the analogous narbonolide PKS genes (picAIIand picAIII) results in the 
production of 3-keto-6-dEB. 

25 Another example of a hybrid PKS comprising a hybrid module is prepared by 

co-expressing the oleAI and oleAII genes with an oleAIII hybrid gene encoding 
extender module 5 and the KS and AT of extender module 6 of the oleandolide PKS 
fused to the ACP of extender module 6 and the TE of the narbonolide PKS. The 
resulting hybrid PKS of the invention produces 3-deoxy-3-oxo-8,8a-deoxyoleandolide 

30 (3-keto-oleandolide). This compound is useful in the production of 1 4-desmethyl 
ketolides, compounds with potent anti-bacterial activity. This compound can also be 
prepared by a recombinant oleandolide derivative PKS of the invention in which the 
KR domain of module 6 of the oleandolide PKS has been deleted or replaced with an 
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inactive KR domain. Moreover, the invention provides hybrid PKSs in which not only 
the above changes have been made but also the AT domain of module 6 has been 
replaced with a malonyl-specific AT. These hybrid PKSs produce 2-desmethyl-3- 
deoxy-3-oxo-8,8a-deoxyoleandolide, a useful intermediate in the preparation of 2,14- 
5 didesmethyl ketolides, compounds with potent antibiotic activity. 

Another illustrative example of a hybrid PKS includes the hybrid PKS of the 
invention resulting only from the latter change in the hybrid PKS just described. Thus, 
co-expression of the oleAI and oleAII genes with a hybrid oleAIII gene in which the 
AT domain of module 6 has been replaced by a malonyl-specific AT results in the 
1 0 expression of a hybrid PKS that produces 2-desmethyl-8,8a-deoxyoleandolide in 
recombinant host cells. This compound is a useful intermediate for making 2,14- 
didesmethyl erythromycins in recombinant host cells of the invention. 

While many of the hybrid PKSs described above are composed primarily of 

oleandolide PKS proteins, those of skill in the art recognize that the present invention 

',7 

1 5 provides many different hybrid PKSs, including those_composed of only a small 

portion of the oleandolide PKS. For example, the present invention provides a hybrid 
PKS in which a hybrid oleAI gene that encodes the oleandolide loading module fused 
to extender modules 1 and 2 of DEBS is coexpressed with the eryAIIand eryAIII 
genes. The resulting hybrid PKS produces 8,8a-deoxyoleandolide. When the construct 

* 

20 is expressed in Saccharopolyspora erythraea host cells (either via chromosomal 
integration in the chromosome or via a vector that encodes the hybrid PKS), the 
resulting recombinant host cell of the invention produces 14-desmethyl 
erythromycins. Another illustrative example is the hybrid PKS of the invention 
composed of the oleAI and eryAII and eryAIII gene products. This construct is also 

25 useful in expressing 14-desmethyl erythromycins in Saccharopolyspora erythraea 
host cells, as described in Example 3, below. In a preferred embodiment, the S. 
erythraea host cells are eryAI mutants that do not produce 6-deoxyerythronolide B. 

Another example is the hybrid PKS of the invention composed of the products 
of the picAI and picAII genes (the two proteins that comprise the loading module and 

30 extender modules 1 - 4, inclusive, of the narbonolide PKS) and the oleAIII gene. The 
resulting hybrid PKS produces the macrolide aglycone 3-hydroxy-narbonolide in 
Streptomyces lividans host cells and the corresponding erythromycins in 
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Saccharopolyspora erythraea host cells. This hybrid PKS of the invention is 
described in Example 5, below. 

Each of the foregoing hybrid PKS enzymes of the invention, and the hybrid 
PKS enzymes of the invention generally, can be expressed in a host cell that also 
expresses a functional oleP gene product. Such expression provides the compounds of 
the invention in which the C-8-C-8a epoxide is present. 

The following Table lists references describing illustrative PKS genes and 
corresponding enzymes that can be utilized in the construction of the recombinant 
hybrid PKSs and the corresponding DNA compounds that encode them of the 
invention. Also presented are various references describing tailoring enzymes and 
corresponding genes that can be employed in accordance with the methods of the 
invention. 
Avermectin 



5 ^* 



U.S. Pat. No. 5,252,474 to Merck. 

MacNeil et al. y 1993, Industrial Microorganisms: Basic and Applied 
Mol ecular Genetics , Baltz, Hegeman, & Skatrud, eds. (ASM), pp. 245-256, A 
Comparison of the Genes Encoding the Polyketide Synthases for Avermectin, 
Erythromycin, and Nemadectin. 

MacNeil et al. t 1992, Gene 115: 1 19-125, Complex Organization of the 
Streptomyces avermitilis genes encoding the avermectin polyketide synthase. 
Candicidin (FR008) ' 

Hu et aU 1994, Mol Microbiol 14: 163-172. 
Epothilone 

U.S. patent application Serial No. 60/130,560, filed 22 Apr. 1999, and Serial 
No. 60/122,620, filed 3 Mar. 1999. 
Erythromycin 

PCT Pub. No. 93/13663 to Abbott. 

US Pat. No. 5,824,513 to Abbott. 

Donadio et al y 1991, Science 252:675-9* 

Cortes et al, 8 Nov. 1990, Nature 345:176-8, An unusually large 
multifunctional polypeptide in the erythromycin producing polyketide synthase of 
Saccharopolyspora erythraea. 
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Glycosylation Enzymes 
PCT Pat. App. Pub. No. 97/23630 to Abbott. 
FK-506 

5 Motamedi et a/., 1998, The biosynthetic gene cluster for the macrolactone ring 

of the immunosuppressant FK506, Eur. J. biochem. 256: 528-534. 

Motamedi et al., 1997, Structural organization of a multifunctional polyketide 
synthase involved in the biosynthesis of the macrolide immunosuppressant FK506, 
Eur. J. Biochem. 244: 74-80. 
10 Methyltransferase 

US 5,264,355, issued 23 Nov. 1993~, Methylating enzyme from Streptomyces 
MA6858. 31-O-desmethyl-FK506 methyltransferase. 

Motamedi et al. y 1 996, Characterization of methyltransferase and hydroxylase 
genes involved in the biosynthesis of the immunosuppressants FK506 and FK520, J. 



1 5 Bacteriol 1 78: 5243-5248. 
FK-520 



U.S. patent application Serial No. 60/139,650, filed 17 Jun. 1999, and 
60/123,810, filed 11 Mar. 1999. See also Nielsen et at., 1991, Biochem. J0:5789-96 
(enzymology of pipecolate incorporation). 
20 Lovastatin 

U.S. Pat. No. 5,744,350 to Merck. ' 
Narbomycin (and Picromycin) 

PCT patent application No. WO US99/11814, filed 28 May 1999. 
Nemadectin 
25 MacNeil et al., 1993, supra. 

Niddamycin 

Kakavas et al, 1997, Identification and characterization of the niddamycin 
polyketide synthase genes from Streptomyces caelestis, J. Bacteriol 179: 7515-7522. 
Platenolide 

30 EP Pat. App. Pub. No. 791,656 to Lilly. 

Rapamycin 

Schwecke et a/., Aug. 1995, The biosynthetic gene cluster for the 
polyketide rapamycin, Proc. Natl. Acad. Sci. USA 92:7839-7843. 
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Aparicio et al, 1996, Organization of the biosynthetic gene cluster for 
rapamycin in Streptomyces hygroscopicus: analysis of the enzymatic domains in the 
modular polyketide synthase, Gene 169: 9-16. 
Rifamycin 

5 August et a/., 13 Feb. 1998, Biosynthesis of the ansamycin antibiotic 

rifamycin: deductions from the molecular analysis of the n/biosynthetic gene cluster 
of Amycolatopsis mediterranei S669, Chemistry & Biology, 5(2): 69-79. 
Soraphen 

U.S. Pat. No. 5,716,849 to Novartis. 
1 0 Schupp et aL y 1 995, J. Bacteriology 1 77: 3673-3679. A Sorangium cellulosum 

(Myxobacterium) Gene Cluster for the Biosynthesis of the Macrolide Antibiotic 

Soraphen A: Cloning, Characterization, and Homology to Polyketide Synthase Genes 

" ^ -# 

from Actinomycetes. 

.* v a 

Spiramycin 

v 

15 U.S. Pat. No. 5,098,837 to Lilly. 

Activator Gene 

'if 



U.S. Pat. No. 5,514,544 to Lilly. 
Tylosin 

EP Pub. No. 791,655 to Lilly. * ^ ' 

20 Kuhstoss et a/., 1996, Gene 753:231-6., Production of a novel polyketide 

through the construction of a hybrid polyketide synthase. 

U.S. Pat. No. 5,876,991 to Lilly. 
Tailoring enzymes 

Merson-Davies and CundlifFe, 1994, MoL Microbiol 13: 349-355. Analysis of 
25 five tylosin biosynthetic genes from the tylBA region of the Streptomyces fradiae 
genome. 

As the above Table illustrates, there are a wide variety of PKS genes that serve 
as readily available sources of DNA and sequence information for use in constructing 
the hybrid PKS-encoding DNA compounds of the invention. Methods for 
30 constructing hybrid PKS-encoding DNA compounds are described without reference 
to the oleandolide PKS in U.S. Patent Nos. 5,672,491 and 5,712,146 and PCT 
publication No. 98/49315, each of which is incorporated herein by reference. 

In constructing hybrid PKSs of the invention, certain general methods may be 
helpful. For example, it is often beneficial to retain the framework of the module to be 



WO 00/26349 - 53 - PCT/US99/24478 

altered to make the hybrid PKS. Thus, if one desires to add DH and ER functionalities 
to a module, it is often preferred to replace the KR domain of the original module 
with a KR, DH, and ER domain-containing segment from another module, instead of 
merely inserting DH and ER domains. One can alter the stereochemical specificity of 
5 a module by replacement of the KS domain with a KS domain from a module that 
specifies a different stereochemistry. See Lau et a/., 1999, "Dissecting the role of 
acyltransferase domains of modular polyketide synthases in the choice and 
stereochemical fate of extender units" Biochemistry 35(5): 1 643-1 65 1 , incorporated 

herein by reference. One can alter the specificity of an AT domain by changing only a 

■ - - .f •■ . ' ■ ■ - • 

1 0 small segment of the domain. See Lau et al. 9 supra. One can also take advantage of 
known linker regions in PKS proteins to link modules from two different PKSs to 
create a hybrid PKS. See Gokhale et al. 9 16 Apr. 1999, 'Dissecting and Exploiting 
Intermodular Communication in Polyketide Synthases", Science 284: 482-485, 
incorporated herein by reference. 

a. 

1 5 The hybrid PKS-encoding DNA compounds of the invention can be and often 



are hybrids of more than two PKS genes. Even where only two genes are used, there 

* r - ? & 

are often two or more modules in the hybrid gene in which all or part of the module is 

"V'T '■ * 

derived from a second (or third) PKS gene. Thus, as one illustrative example, the 

invention provides a hybrid PKS that contains the naturally occurring loading module 

v; n $ 

20 and thioesterase domain as well as extender modules one, two, four, and six of the 

•i 

oleandolide PKS and further contains hybrid or heterologous extender modules three 
and five. Hybrid or heterologous extender modules three and five contain AT domains 
specific for malonyl CoA and derived from, for example, the rapamycin PKS genes. 
To construct a hybrid PKS or oleandolide PKS of the invention, one can 

25 employ a technique, described in PCT Pub. No. 98/27203 and U.S. provisional patent 
application Serial No. 60/129,731, filed 16 Apr. 99, incorporated herein by reference, 
in which the large oleandolide PKS gene cluster is divided into two or more, typically 
three, segments, and each segment is placed on a separate expression vector. In this 
manner, each of the segments of the gene can be altered, and various altered segments 

30 can be combined in a single host cell to provide a recombinant PKS gene of the 

invention. This technique makes more efficient the construction of large libraries of 
recombinant PKS genes, vectors for expressing those genes, and host cells comprising 
those vectors. 
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The invention also provides libraries of PKS genes, PKS proteins, and 
ultimately, of polyketides, that are constructed by generating modifications in the 
oleandolide PKS so that the protein complexes produced have altered activities in one 
or more respects and thus produce polyketides other than the oleandolide natural 
product of the PKS. Novel polyketides may thus be prepared, or polyketides in 
general prepared more readily, using this method. By providing a large number of 
different genes or gene clusters derived from a naturally occurring PKS gene cluster, 
each of which has been modified in a different way from the native cluster, an 
effectively combinatorial library of polyketides can be produced as a result of the 
multiple variations in these activities. As will be further described below, the metes 
and bounds of this embodiment of the invention can be described on the polyketide, 
protein, and the encoding nucleotide sequence levels. 

As described above, a modular PKS "derived from" the oleandolide or other 
naturally occurring PKS includes a modular PKS (or its corresponding encoding 
gene(s)) that retains the scaffolding of the utilized portion of the naturally occurring 

gene. Not all modules need be included in the constructs; the constructs can include a 

& _ . ........ %, n - 

loading module and six, fewer than six, or more than six extender modules. On the 
constant scaffold, at least one enzymatic activity is mutated, deleted, replaced, or 

■ 

inserted so as to alter the activity of the resulting PKS relative to the original PKS. 
Alteration results when these activities are deleted or are replaced by a different 
version of the activity, or simply mutated in such a way that a polyketide other than 
the natural product results from these collective activities. This occurs because there 
has been a resulting alteration of the starter unit and/or extender unit, stereochemistry, 
chain length or cyclization, and/or reductive or dehydration cycle outcome at a 
corresponding position in the product polyketide. Where a deleted activity is replaced, 
the origin of the replacement activity may come from a corresponding activity in a 
different naturally occurring PKS or from a different region of the oleandolide PKS. 
Any or all of the oleandolide PKS genes may be included in the derivative or portions 
of any of these may be included, but the scaffolding of the PKS protein is retained in 
whatever derivative is constructed. The derivative preferably contains a thioesterase 
activity from the oleandolide or another PKS. 

Thus, a PKS derived from the oleandolide PKS includes a PKS that contains 
the scaffolding of all or a portion of the oleandolide PKS. The derived PKS also 
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contains at least two extender modules that are functional, preferably three extender 
modules, and more preferably four or more extender modules, and most preferably six 
extender modules. The derived PKS also contains mutations, deletions, insertions, or 
replacements of one or more of the activities of the functional modules of the 
5 oleandolide PKS so that the nature of the resulting polyketide is altered at both the 
protein and DNA sequence levels. Particular preferred embodiments include those 
wherein a KS, AT, or ACP domain has been deleted or replaced by a version of the 
activity from a different PKS or from another location within the same PKS. Also 
preferred are derivatives where at least one non-condensation cycle enzymatic activity 
10 (KR, DH, or ER) has been deleted or added or wherein any of these activities has 
been mutated so as to change the structure of the polyketide synthesized by the PKS. 

m * 

Conversely, also included within the definition of a PKS derived from the 
oleandolide PKS are functional non-oleandoiide PKS modules or their encoding genes 
wherein at least one portion, or two or more portions, of the oleandolide PKS 



15 activities have been inserted. Exemplary is the use of the oleandolide AT for extender 
module 2, which accepts a methylmalonyl CoA extender unit rather than malonyl 
CoA, to replace a malonyl specific AT in another PKS. Other examples include 
insertion of portions of non-condensation cycle enzymatic activities or other regions 
of oleandolide synthase activity into a heterologous PKS at both the DNA and protein 
20 levels. 

4 

Thus, there are at least five degrees of freedom for constructing a hybrid PKS 
in terms of the polyketide that will be produced. First, the polyketide chain length is 
determined by the number of modules in the PKS, and the present invention includes 
hybrid PKSs that contain a loading module and 6, as well as fewer or more than 6, 

25 extender modules. Second, the nature of the carbon skeleton of the PKS is determined 
by the specificities of the acyl transferases that determine the nature of the extender 
units at each position, e.g., malonyl, methylmalonyl, ethylmalonyl, or other 
substituted malonyl. Third, the loading module specificity also has an effect on the 
resulting carbon skeleton of the polyketide. The loading module may use a different 

30 starter unit, such as priopionyl, butyryl, and the like. As noted above and in the 
examples below, another method for varying loading module specificity involves 
inactivating the KS activity in extender module 1 (KS1) and providing alternative 
substrates, called diketides, that are chemically synthesized analogs of extender 
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module 1 diketide products, for extender module 2. This approach was illustrated in 
PCT publication Nos. 97/02358 and 99/03986, incorporated herein by reference, 
wherein the KS1 activity was inactivated through mutation. Fourth, the oxidation state 
at various positions of the polyketide will be determined by the dehydratase and 
5 reductase portions of the modules. This will determine the presence and location of 
ketone and alcohol moieties and C-C double bonds or C-C single bonds in the 
polyketide. Finally, the stereochemistry of the resulting polyketide is a function of 
three aspects of the synthase. The first aspect is related to the AT/KS specificity 
associated with substituted malonyls as extender units, which affects stereochemistry 
10 only when the reductive cycle is missing or when it contains only a ketoreductase, as 

the dehydratase would abolish chirality. Second, the specificity of the ketoreductase 

I*- 

may determine the chirality of any beta-OH. Finally, the enoylreductase specificity 



for substituted malonyls as extender units may influence the stereochemistry when 
there is a complete KR/DH/ER available. 
15 Thus, the modular PKS systems generally and the oleandolide PKS system 

particularly permit a wide range of polyketides to be synthesized. As compared to the 
aromatic PKS systems, the modular PKS systems accept a wider range of starter units, 

' f A 

including aliphatic monomers (acetyl, propionyl, butyryl, isovaleryl, etc.), aromatics 

^' v 

(aminohydroxybenzoyl), ahcyclics (cyclohexanoyl), and heterocyclics (thiazolyl). 

20 Certain modular PKSs have relaxed specificity for their starter units (Kao et al , 1994, 

?. 

Science, supra). Modular PKSs also exhibit considerable variety with regard to the 
choice of extender units in each condensation cycle. The degree of beta-ketoreduction 
following a condensation reaction has also been shown to be altered by genetic 
manipulation (Donadio et al, 1991, Science, supra; Donadio et al, 1993, Proc. Natl 

25 Acad. Sci. USA 90: 71 19-7123). Likewise, the size of the polyketide product can be 
varied by designing mutants with the appropriate number of modules (Kao et al, 
1994,7. Am. Chem. Soc. 776:11612-11613). Lastly, modular PKS enzymes are 
particularly well known for generating an impressive range of asymmetric centers in 
their products in a highly controlled manner. The polyketides, antibiotics, and other 

30 compounds produced by the methods of the invention are typically single 

stereoisomeric forms. Although the compounds of the invention can occur as mixtures 
of stereoisomers, it may be beneficial in some instances to generate individual 
stereoisomers. Thus, the combinatorial potential within modular PKS pathways based 
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on any naturally occurring modular, such as the oleandolide, PKS scaffold is virtually 
unlimited. 

While hybrid PKSs are most often produced by "mixing and matching" 
portions of PKS coding sequences, mutations in DNA encoding a PKS can also be 
5 used to introduce, alter, or delete an activity in the encoded polypeptide. Mutations 
can be made to the native sequences using conventional techniques. The substrates for 
mutation can be an entire cluster of genes or only one or two of them; the substrate for 
mutation may also be portions of one or more of these genes. Techniques for mutation 
include preparing synthetic oligonucleotides including the mutations and inserting the 
1 0 mutated sequence into the gene encoding a PKS subunit using restriction 

endonuclease digestion. See, e.g., Kunkel, lj*85, Proc. Natl. Acad. Sci. USA 82: 448; 

»* 

Geisselsoder et al, 1987, BioTechniques J:786. Alternatively, the mutations can be 

* • »r 

effected using a mismatched primer (generally 10-20 nucleotides in length) that 

t * 

hybridizes to the native nucleotide sequence, at a temperature below the melting 

1 5 temperature of the mismatched duplex. The primer can oe made specific by keeping 

primer length and base composition within relatively narrow limits and by keeping the 

mutant base centrally located. See Zoller and Smith, 1 983, Methods Enzymol 

• i*- \& } 

700:468. Primer extension is effected using DNA polymerase, the product cloned, and 

clones containing the mutated DNA, derived by segregation of the pnmer extended 

20 strand, selected* Identification can be accomplished using the mutant primer as a 

* ■'. 

hybridization probe. The technique is also applicable for generating multiple point 
mutations. See, e.g., Dalbie-McFarland et al, 1982, Proc. Natl Acad. Sci. USA 79: 
6409. PCR mutagenesis can also be used to effect the desired mutations. 

Random mutagenesis of selected portions of the nucleotide sequences 

25 encoding enzymatic activities can also be accomplished by several different 

techniques known in the art, e.g., by inserting an oligonucleotide linker randomly into 
a plasmid, by irradiation with X-rays or ultraviolet light, by incorporating incorrect 
nucleotides during in vitro DNA synthesis, by error-prone PCR mutagenesis, by 
preparing synthetic mutants, or by damaging plasmid DNA in vitro with chemicals. 

30 Chemical mutagens include, for example, sodium bisulfite, nitrous acid, 

nitrosoguanidine, hydroxylamine, agents which damage or remove bases thereby 
preventing normal base-pairing such as hydrazine or formic acid, analogues of 
nucleotide precursors such as 5-bromouracil, 2-aminopurine, or acridine intercalating 
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agents such as proflavine, acriflavine, quinacrine, and the like. Generally, plasmid 
DNA or DNA fragments are treated with chemical mutagens, transformed into E. coli 
and propagated as a pool or library of mutant plasmids. 

In constructing a hybrid PKS of the invention, regions encoding enzymatic 
5 activity, i.e., regions encoding corresponding activities from different PKS synthases 
or from different locations in the same PKS, can be recovered, for example, using 
PCR techniques with appropriate primers. By "corresponding" activity encoding 
regions is meant those regions encoding the same general type of activity. For 
example, a KR activity encoded at one location of a gene cluster "corresponds" to a 
1 0 KR encoding activity in another location in the gene cluster or in a different gene 
cluster. Similarly, a complete reductase cycle could be considered corresponding. For 
example, KR/DH/ER can correspond to a KR alone. 

If replacement of a particular target region in a host PKS is to be made, this 

replacement can be conducted in vitro using suitable restriction enzymes. The 

* 

1 5 replacement can also be effected in vivo using recombinant techniques involving 
homologous sequences framing the replacement gene in a donor plasmid and a 
receptor region in a recipient plasmid. Such systems, advantageously involving 
plasmids of differing temperature sensitivities are described, for example, in PCT 
publication No. WO 96/40968, incorporateoherein fey'reference. The vectors used to 

20 perform the various operations to replace the enzymatic activity in the host PKS genes 
or to support mutations in these regions of the host PKS genes can be chosen to 
contain control sequences operably linked to the resulting coding sequences in a 
manner such that expression of the coding sequences can be effected in an appropriate 
host. 

25 However, simple cloning vectors may be used as well. If the cloning vectors 

employed to obtain PKS genes encoding derived PKS lack control sequences for 
expression operably linked to the encoding nucleotide sequences, the nucleotide 
sequences are inserted into appropriate expression vectors. This need not be done 
individually, but a pool of isolated encoding nucleotide sequences can be inserted into 

30 expression vectors, the resulting vectors transformed or transfected into host cells, and 
the resulting cells plated out into individual colonies. The invention provides a variety 
of recombinant DNA compounds in which the various coding sequences for the 
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domains and modules of the oleandolide PKS are flanked by non-naturally occurring 
restriction enzyme recognition sites. 

The various PKS nucleotide sequences can be cloned into one or more 
recombinant vectors as individual cassettes, with separate control elements, or under 
5 the control of, e.g., a single promoter. The PKS subunit encoding regions can include 
flanking restriction sites to allow for the easy deletion and insertion of other PKS 
subunit encoding sequences so that hybrid PKSs can be generated. The design of such 
unique restriction sites is known to those of skill in the art and can be accomplished 
using the techniques described above, such as site-directed mutagenesis and PCR. 
10 The expression vectors containing nucleotide sequences encoding a variety of 

PKS enzymes for the production of different polyketides are then transformed into the 
appropriate host cells to construct the library. In one straightforward approach, a 



mixture of such vectors is transformed into the selected host cells and the resulting 

i ■* 

cells plated into individual colonies and selected to identify successful transformants. 

15 Each individual colony has the ability to produce a particular PKS synthase and 

ultimately a particular polyketide. Typically, there will be duplications in some, most, 
or all of the colonies; the subset of the transformed colonies that contains a different 
PKS in each member colony can be considered the library. Alternatively, the 
expression vectors can be used individually to transform hosts, which transformed 

20 hosts are then assembled into a library. A variety of strategies are available to obtain a 
multiplicity of colonies each containing a PKS gene cluster derived from the naturally 
occurring host gene cluster so that each colony in the library produces a different PKS 
and ultimately a different polyketide. The number of different polyketides that are 
produced by the library is typically at least four, more typically at least ten, and 

25 preferably at least 20, and more preferably at least 50, reflecting similar numbers of 
different altered PKS gene clusters and PKS gene products. The number of members 
in the library is arbitrarily chosen; however, the degrees of freedom outlined above 
with respect to the variation of starter, extender units, stereochemistry, oxidation state, 
and chain length enables the production of quite large libraries. 

30 Methods for introducing the recombinant vectors of the invention into suitable 

hosts are known to those of skill in the art and typically include the use of CaCh or 
agents such as other divalent cations, lipofection, DMSO, PEG, protoplast 
transformation, infection, transfection, and electroporation. The polyketide producing 
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colonies can be identified and isolated using known techniques and the produced 
polyketides further characterized. The polyketides produced by these colonies can be 
used collectively in a panel to represent a library or may be assessed individually for 
activity. 

The libraries of the invention can thus be considered at four levels: (1) a 
multiplicity of colonies each with a different PKS encoding sequence; (2) the proteins 
produced from the coding sequences; (3) the polyketides produced from the proteins 
assembled into a functional PKS; and (4) antibiotics or compounds with other desired 
activities derived from the polyketides. Combination libraries can also be constructed 
wherein members of a library derived, for example, from the oleandolide PKS can be 
considered as a part of the same library as those derived from, for example, the 
rapamycin PKS or DEBS. 

Colonies in the library are induced to produce the relevant synthases and thus 

to produce the relevant polyketides to obtain a library of polyketides. Polyketides that 

- - 
are secreted into the media or have been otherwise isolated can be screened for 

binding to desired targets, such as receptors, signaling proteins, and the like. The 

supernatants per se can be used for screening, or partial or complete purification of 

the polyketides can first be effected. Typically, such screening methods involve 

detecting the binding of each member of the library to receptor or other target ligand. 

Binding can be detected either directly or through a competition assay. Means to 

screen such libraries for binding are well known in the art. Alternatively, individual 

polyketide members of the library can be tested against a desired target. In this event, 

screens wherein the biological response of the target is measured can more readily be 

included. Antibiotic activity can be verified using typical screening assays such as 

those set forth in Lehrer etal t 1991, J. Immunol Metk 1 3 7: 1 67- 1 73 , incorporated 

herein by reference, and in Example 7, below. 

The invention provides methods for the preparation of a large number of 

polyketides. These polyketides are useful intermediates in formation of compounds 

with antibiotic or other activity through hydroxylation and glycosylation reactions as 

described above. In general, the polyketide products of the PKS must be further 

modified, typically by hydroxylation and glycosylation, to exhibit antibiotic activity. 

Hydroxylation results in the novel polyketides of the invention that contain hydroxyl 

groups at C-6, which can be accomplished using the hydroxylase encoded by the eryF 
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gene, and/or C-12, which can be accomplished using the hydroxylase encoded by the 
picK or eryK gene. Also, the present invention provides the oleP gene in recombinant 
form, which can be used to express the oleP gene product in any host cell. A host cell, 
such as a Streptomyces host cell or a Saccharopolyspora erythraea host cell modified 
5 to express the oleP gene thus can be used to produce polyketides comprising the C-8- 
C-8a epoxide present in oleandomycin. Thus the invention provides such modified 
polyketides. The presence of hydroxyl groups at these positions can enhance the 
antibiotic activity of the resulting compound relative to its unhydroxylated 
counterpart. 

10 Methods for glycosylating the polyketides are generally known in the art; the 

glycosylation may be effected intracellular^ by providing the appropriate 



glycosylation enzymes or may be effected in vitro using chemical synthetic means as 



described herein and in PCT publication No. WO 98/49315, incorporated herein by 
** . ^ ^ i 

reference. Preferably, glycosylation with desosamine is effected in accordance with 

. .* 

15 the methods of the invention in recombinant host cells provided by the invention. In 
general, the approaches to effecting glycosylation mirror those described above with 
respect to hydroxylation. The purified en2ymes, isolated from native sources or 
recombinantly produced may be used in vitro. Alternatively and as noted, 



glycosylation may be effected intracellular^ using endogenous or recombinantly 
20 produced intracellular glycosyl transferases. In addition, synthetic chemical methods 
may be employed. 

* 

The antibiotic modular polyketides may contain any of a number of different 
sugars, although D-desosamine, or a close analog thereof, is most common. 
Erythromycin, picromycin, narbomycin, and methymycin contain desosamine. 

25 Erythromycin also contains L-cladinose (3-O-methyl mycarose). Tylosin contains 
mycaminose (4-hydroxy desosamine), mycarose and 6-deoxy-D-allose. 2-acetyl-l- 
bromodesosamine has been used as a donor to glycosylate polyketides by Masamune 
etal, 1975, 1 Am. Chem. Soc. 97: 3512-3513. Other, apparently more stable donors 
include glycosyl fluorides, thioglycosides, and trichloroacetimidates; see Woodward 

30 et a/., 1981, /. Am. Chem. Soc. 103: 3215; Martin et al. t 1997, J. Am. Chem. Soc. 119: 
3193; Toshima et al., 1995, J. Am. Chem. Soc. 117: 3717; Matsumoto et al. t 1988, 
Tetrahedron Lett. 29: 3575. Glycosylation can also be effected using the polyketide 
aglycones as starting materials and using Saccharopolyspora erythraeaJStreptomyces 
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m 

venezuelae or other host cells to make the conversion, preferably using mutants 
unable to synthesize macrolides, as discussed in the preceding Section. 

Thus, a wide variety of polyketides can be produced by the hybrid PKS 

fi ■ 

enzymes of the invention. These polyketides are useful as antibiotics and as 
5 intermediates in the synthesis of other useful compounds, as described in the 
following section. 

Section IV: Compounds 

The methods and recombinant DNA compounds of the invention are useful in 
1 0 the production of polyketides. In one important aspect, the invention provides 

methods for making antibiotic compounds related in structure to oleandomycin and 
erythromycin, both potent antibiotic compounds. The invention also provides novel 
ketolide compounds, polyketide compounds with potent antibiotic activity of 
significant interest due to activity against antibiotic resistant strains of bacteria. See 

*- - 

1 5 Griesgraber et aL, 1996, J. Antibiot 49: 465-477, incorporated herein by reference. 
Most if not all of the ketolides prepared to date are synthesized using erythromycin A, 
a derivative of 6-dEB, as an intermediate. While the invention provides hybrid PKSs 
that produce a polyketide different in structure from 6-dEB, the invention also 
provides methods for making intermediates useful in preparing traditional, 6-dEB- 

i 

20 and erythromycin-derived ketolide compounds. 

Because 6-dEB in part differs from oleandolide in that it comprises a 13-ethyl 
instead of a 13-methyl group, the novel hybrid PKS genes of the invention based on 
the oleandolide PKS provide many novel ketolides that differ from the known 
ketolides only in that they have a 13-methyl instead of 13-ethyl group. Thus, the 

25 invention provides the 13-methyl analogues of the ketolides and intermediates and 
precursor compounds described in, for example, Griesgraber et aL, supra; Agouridas 
et aL, 1998, J. Med Chem. 41: 4080-4100, U.S. Patent Nos. 5,770,579; 5,760,233; 
5,750,510; 5,747,467; 5,747,466; 5,656,607; 5,635,485; 5,614,614; 5,556,118; 

- 

5,543,400; 5,527,780; 5,444,051; 5,439,890; 5,439,889; and PCT publication Nos. 
30 WO 98/09978 and 98/28316, each of which is incorporated herein by reference. 

As noted above, the hybrid PKS genes of the invention can be expressed in a 
host cell that contains the desosamine biosynthetic genes and desosaminyl transferase 
gene as well as the required hydroxylase gene(s), which may be either picK (for the 
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C-12 position) or eryK (for the C-12 position) and/or eryF (for the C-6 position). The 
resulting compounds have antibiotic activity but can be further modified, as described 
in the patent publications referenced above, to yield a desired compound with 
improved or otherwise desired properties. Alternatively, the aglycone compounds can 
be produced in the recombinant host cell, and the desired glycosylation and 
hydroxylation steps carried out in vitro or in vivo, in the latter case by supplying the 
converting cell with the aglycone. 

The compounds of the invention are thus optionally glycosylated forms of the 
polyketide set forth in formula (1) below which are hydroxy lated at either the C-6 or 
the C-12 or both. The compounds of formula (1) can be prepared using the loading 
and the six extender modules of a modular PKS, modified or prepared in hybrid form 
as herein described. These polyketides have &e formula: 




R 6 

including the glycosylated and isolated stereoisomeric forms thereof; 

wherein R* is a straight chain, branched or cyclic, saturated or unsaturated 
substituted or unsubstituted hydrocarbyl of 1 - 1 5C; 

each of R'-R 6 is independently H or alkyl (1-4C) wherein any alkyl at R 1 may 
optionally be substituted; 

each of X l -X 5 is independently two H, H and OH, or =0; or 

each of X l -X 5 is independently H and the compound of formula (2) contains a 
double-bond in the ring adjacent to the position of said X at 2-3, 4-5, 6-7, 8-9 and/or 
10-11; 

with the proviso that: 

at least two of R*-R 6 are alkyl (1-4C). 
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Preferred compounds comprising formula 2 are those wherein at least three of 
are alkyl (1-4C), preferably methyl or ethyl; more preferably wherein at least 
four of R'-R 5 are alkyl (1-4C), preferably methyl or ethyl. Also preferred are those 
wherein X 2 is two H, =0, or H and OH, and/or X 3 is H, and/or X 1 is OH and/or X 4 is 
5 OH and/or X 5 is OH. Also preferred are compounds with variable R* when R*-R 5 is 
methyl, X is =0, and X , X and X are OH. The glycosylated forms of the foregoing 
are also preferred; glycoside residues can be attached at C-3, C-5, and/or C-6; the 
epoxidated forms are also included, i.e., and epoxide at C-8-C-8a. 

As described above, there are a wide variety of diverse organisms that can 
10 modify compounds such as those described herein to provide compounds with or that 
can be readily modified to have useful activities. For example, Saccharopolyspora 

* y 

T ^ 

erythraea can convert oleandolide and 6-dEB to a variety of useful compounds. The 
compounds provided by the present invention can be provided to cultures of 
Saccharopolyspora erythraea and converted to the corresponding derivatives of 
1 5 erythromycins A, B, C, and D in accordance with the procedure provided in Example 
6, below. To ensure that only the desired compound is produced, one can use an 5. 
erythraea eryA mutant that is unable to produce 6-dEB but can still carry out the 
desired conversions (Weber a/., 1985, j. Bacterial 164(1): 425-433). Also, one can 



employ other mutant strains, such as eryB, eryC, eryG, and/or eryK mutants, or 
20 mutant strains having mutations in multiple genes, to accumulate a preferred 

compoimd. The conversion can also be carried out in large fermentors for commercial 
production. Each of the erythromycins A, B, C, and D has antibiotic activity, although 
erythromycin A has the highest antibiotic activity. Moreover, each of these 
compounds can form, under treatment with mild acid, a C-6 to C-9 hemiketal with 
25 motilide activity. For formation of hemiketals with motilide activity, erythromycins 
B, C, and D, are preferred, as the presence of a C- 12 hydroxyl allows the formation of 
an inactive compound that has a hemiketal formed between C-9 and C-12. 

Thus, the present invention provides the compounds produced by 
hydroxylation and glycosylation of the compounds of the invention by action of the 
30 enzymes endogenous to Saccharopolyspora erythraea and mutant strains of S. 

* 

erythraea. Such compounds are useful as antibiotics or as motilides directly or after 
chemical modification. For use as antibiotics, the compounds of the invention can be 
used directly without further chemical modification. Erythromycins A, B, C, and D all 
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have antibiotic activity, and the corresponding compounds of the invention that result 
from the compounds being modified by Saccharopolyspora erythraea also have 
antibiotic activity. These compounds can be chemically modified, however, to 
provide other compounds of the invention with potent antibiotic activity. For 

5 example, alkylation of erythromycin at the C-6 hydroxyl can be used to produce 
potent antibiotics (clarithromycin is C-6-O-methyl), and other useful modifications 
are described in, for example, Griesgraber et a/., 1996, J. Antibiot. 49: 465-477, 
Agouridas et al, 1998, J. Med. Chem. 41: 4080-4100, U.S. Patent Nos. 5,770,579; 
5,760,233; 5,750,510; 5,747,467; 5,747,466; 5,656,607; 5,635,485; 5,614,614; 

10 5,556,118; 5,543,400; 5,527,780; 5,444,051; 5,439,890; and 5,439,889; and PCT 

publication Nos. WO 98/09978 and 98/28316, each of which is incorporated herein by 
reference. 

For use as motilides, the compounds of the invention can be used directly 
without further chemical modification. Erythromycin and certain erythromycin 
1 5 analogs are potent agonists of the motilin receptor that can be used clinically as 
prokinetic agents to induce phase III of migrating motor complexes, to increase 



esophageal peristalsis and LES pressure in patients with GERD, to accelerate gastnc 
emptying in patients with gastric paresis, and to stimulate gall bladder contractions in 
patients after gallstone removal and in diabetics with autonomic neuropathy. See 
20 Peeters, 1999, Motilide Web Site, http://www.med.kuleuven. 

ac.be/med/gih/motilid.htm, and Omura et a0987, Macrolides with gastrointestinal 
motor stimulating activity, 7. Med. Chem. 30: 1941-3). The corresponding compounds 
of the invention that result from the compounds of the invention being modified by 
Saccharopolyspora erythraea also have motilide activity, particularly after 
25 conversion, which can also occur in vivo, to the C-6 to C-9 hemiketal by treatment 
with mild acid. Compounds lacking the C-12 hydroxyl are especially preferred for use 
as motilin agonists. These compounds can also be further chemically modified, 
however, to provide other compounds of the invention with potent motilide activity. 
Moreover, and also as noted above, there are other useful organisms that can 
30 be employed to hydroxylate and/or glycosylate the compounds of the invention. As 
described above, the organisms can be mutants unable to produce the polyketide 
normally produced in that organism, the fermentation can be carried out on plates or 
in large fermentors, and the compounds produced can be chemically altered after 
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fermentation. In addition to Saccharopolyspora erythraea, Streptomyces venezuelae, 
S. narbonensis, S. antibioticus, Micromonospora megalomicea, S.fradiae, and S. 
thermotolerans can also be used. In addition to antibiotic activity, compounds of the 
invention produced by treatment with M. megalomicea enzymes can have 

5 antiparasitic activity as well. Thus, the present invention provides the compounds 
produced by hydroxylation and glycosylation by action of the enzymes endogenous to 
S. erythraea, S. venezuelae, S. narbonensis, S. antibioticus, At. megalomicea, S. 
fradiae, and S. thermotolerans. 

The present invention also provides methods and genetic constructs for 

10 producing the glycosylated and/or hydroxy lated compounds of the invention directly 
in the host cell of interest. Thus, the recombinant genes of the invention, which 
include recombinant oleAI, oleAII, and oleAIII genes with one or more deletions 
and/or insertions, including replacements of an oleA j|ene fragment with a gene 
fragment from a heterologous PKS gene, can be included on expression vectors 

1 5 suitable for expression of the encoded gene products in Saccharopolyspora erythraea, 
Micromonospora megalomicea, Streptomyces antibioticus, 5. venezuelae, S. 
narbonensis, S. fradiae, and S. thermotolerans. 

Many of the compounds of the invention contain one or more chiral centers, 

and all of the stereoisomers are included within the scope of the invention, as pure 

* * v. 

20 compounds as well as mixtures of stereoisomers. Thus the compounds of the 

i 

invention may be supplied as a mixture of stereoisomers in any proportion. 

The compounds of the invention can be produced by growing and fermenting 
the host cells of the invention under conditions known in the art for the production of 
other polyketides. The compounds of the invention can be isolated from the 

25 fermentation broths of these cultured cells and purified by standard procedures. The 
compounds can be readily formulated to provide the pharmaceutical compositions of 
the invention. The pharmaceutical compositions of the invention can be used in the 
form of a pharmaceutical preparation, for example, in solid, semisolid, or liquid form. 
This preparation will contain one or more of the compounds of the invention as an 

30 active ingredient in admixture with an organic or inorganic carrier or excipient 

suitable for external, enteral, or parenteral application. The active ingredient may be 
compounded, for example, with the usual non-toxic, pharmaceutically acceptable 
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carriers for tablets, pellets, capsules, suppositories, solutions, emulsions, suspensions, 
and any other form suitable for use. 

The carriers which can be used include water, glucose, lactose, gum acacia, 
gelatin, mannitol, starch paste, magnesium trisilicate, talc, corn starch, keratin, 
5 colloidal silica, potato starch, urea, and other carriers suitable for use in 

manufacturing preparations, in solid, semi-solid, or liquified form. In addition, 
auxiliary stabilizing, thickening, and coloring agents and perfumes may be used. For 
example, the compounds of the invention may be utilized with hydroxypropyl 
methylcellulose essentially as described in U.S. Patent No. 4,916,138, incorporated 

M V _ _ 

1 0 herein by reference, or with a surfactant essentially as described in EPO patent 
publication No. 428,169, incorporated herein by reference. 

Oral dosage forms may be prepared essentially as described by Hondo et a/., 

V 

1987, Transplantation Proceedings XIX, Supp* 6: 17-22, incorporated herein by 
reference. Dosage forms for external application may be prepared essentially as 



1 5 described in EPO patent publication No. 423,7 14, incorporated herein by reference. 
The active compound is included in the pharmaceutical composition in an 
sufficient to produce the desired effect upon the disease process or condition. 

For the treatment of conditions and diseases caused by infection, a compound 
of the invention may be administered orally, topically, parenterally, by inhalation 
20 spray, or rectally in dosage unit formulations containing conventional non-toxic 

pharmaceutical^ acceptable carriers, adjuvant, and vehicles. The term parenteral, as 
used herein, includes subcutaneous injections, and intravenous, intramuscular, and 
intrasternal injection or infusion techniques. 

Dosage levels of the compounds of the invention are of the order from about 
25 0.01 mg to about 50 mg per kilogram of body weight per day, preferably from about 
0. 1 mg to about 10 mg per kilogram of body weight per day. The dosage levels are 
useful in the treatment of the above-indicated conditions (from about 0.7 mg to about 
3.5 mg per patient per day, assuming a 70 kg patient). In addition, the compounds of 
the invention may be administered on an intermittent basis, i.e., at semi-weekly, 
30 weekly, semi-monthly, or monthly intervals. 

The amount of active ingredient that may be combined with the carrier 
materials to produce a single dosage form will vary depending upon the host treated 
and the particular mode of administration. For example, a formulation intended for 



amount 
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oral administration to humans may contain from 0.5 mg to 5 gm of active agent 
compounded with an appropriate and convenient amount of carrier material, which 
may vary from about 5 percent to about 95 percent of the total composition. Dosage 
unit forms will generally contain from about 0.5 mg to about 500 mg of active 
5 ingredient. For external administration, the compounds of the invention may be 

formulated within the range of, for example, 0.00001% to 60% by weight, preferably 
from 0.001% to 10% by weight, and most preferably from about 0.005% to 0.8% by 
weight. 

It will be understood, however, that the specific dose level for any particular 

1 0 patient will depend on a variety of factors. These factors include the activity of the 

specific compound employed; the age, body weight, general health, sex, and diet of 

the subject; the time and route of administration and the rate of excretion of the drug; 

whether a drug combination is employed in the treatment; and the severity of the 

particular disease or condition for which therapy is sought. 

L -'k • . 

1 5 The compounds of the invention can be used as single therapeutic agents or in 

combination with other therapeutic agents. Drugs that can be usefully combined with 

compounds of the invention include one or more antibiotic of motilide agents. 

A detailed description of the invention having been provided above, the 

following examples are given for the purpose of illustrating the invention and shall 

20 not be construed as being a limitation on the scope of the invention or claims. 

Example 1 
General Methodology 
Bacterial strains, plasmids, and culture conditions. Streptomyces coelicolor 
25 CH999 described in WO 95/08548, published 30 March 1995, or S. lividans K4-1 14 
or K4-155, described in Ziermann and Betlach, Jan. 99, Recombinant Polyketide 
Synthesis in Streptomyces: Engineering of Improved Host Strains, BioTechniques 
26: 106-1 10, incorporated herein by reference, was used as an expression host. DNA 
manipulations were performed in Escherichia coli XLl-Blue, available from 
30 Stratagene. E. coli MC1061 is also suitable for use as a host for plasmid 

manipulation. Plasmids were passaged through E. coli ET12567 (dam dcm hsdS Cm r ) 
(MacNeil, 1988,7. Bacterial. 170: 5607, incorporated herein by reference) to 
generate unmethylated DNA prior to transformation of S. coelicolor or 
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Saccharopolyspora erythraea. E. coli strains were grown under standard conditions. 
S. coelicolor strains were grown on R2YE agar plates (Hop wood et al, Genetic 
manipulation of Streptomyces. A laboratory manual The John Innes Foundation: 
Norwich, 1985, incorporated herein by reference). 

Many of the expression vectors of the invention illustrated in the examples are 
derived from plasmid pRM5, described in WO 95/08548, incorporated herein by 
reference. This plasmid includes a colEI replicon, an appropriately truncated SCP2* 
Streptomyces replicon, two <ac/-promoters, the actl and actlll promoters, to allow for 
bidirectional cloning, the gene encoding the actlhORFA activator which induces 
transcription from act promoters during the transition from growth phase to 
stationary phase, and appropriate marker genes. Engineered restriction sites m the 
plasmid facilitate the combinatorial construction of PKS gene clusters starting from 
cassettes encoding individual domains of naturally occurring PKSs. When plasmid 
pRM5 is used for expression of a PKS, all relevant biosynthetic genes can be 
plasmid-borne and therefore amenable to facile manipulation and mutagenesis in 
E. coli. This plasmid is also suitable for use in Streptomyces host cells. Streptomyces 
is genetically and physiologically well characterized and expresses the ancillary 
activities required for in vivo production of most polyketides. Plasmid pRM5 utilizes 
the act promoter for PKS gene expression^ polyketides are produced in a 
secondary metabolite-like manner, thereby alleviating the toxic effects of 
synthesizing potentially bioactive compounds in vivo. 

Manipulation of DNA and organisms. Polymerase chain reaction (PCR) was 
performed using Pfu polymerase (Stratagene; Taq polymerase from Perkin Elmer 
Cetus can also be used) under conditions recommended by the enzyme manufacturer. 
Standard in vitro techniques were used for DNA manipulations (Sambrook et al 
Molecular Cloning: A Laboratory Manual (Current Edition)). E. coli was transformed 
using standard calcium chloride-based methods; a Bio-Rad E. coli pulsing apparatus 
and protocols provided by Bio-Rad could also be used. S. coelicolor was transformed 
by standard procedures (Hopwood et al. Genetic manipulation of Streptomyces, A 
laboratory manual. The John Innes Foundation: Norwich, 1985), and depending on 
what selectable marker was employed, transformants were selected using 1 mL of a 
1 .5 mg/mL thiostrepton overlay, 1 mL of a 2 mg/mL apramycin overlay, or both. 
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Example 2 

Cloning of the Oleandomycin Biosynthetic Gene Cluster from 

Streptomyces antibioticus 
Genomic DNA (100 ng) was isolated from an oleandomycin producing strain 
5 of Streptomyces antibioticus ( ATCC 11891) using standard procedures. The genomic 
DNA was partially digested with restriction enzyme Sau3Al to generate fragments 
-40 kbp in length, which were cloned into the commercially available Supercos™ 
cosmid vector that had been digested with restriction enzymes Xbal and BamHl to 
produce a genomic library. SuperCosI™ (Stratagene) DNA cosmid arms were 
1 0 prepared as directed by the manufacturer. A cosmid library was prepared by ligating 



2.5 ng of the digested genomic DNA with 1.5 jig of cosmid arms in a 20 reaction. 

• * ■ ■ *•* ■ ' 

One microliter of the ligation mixture was propagated in E. coli XL 1 -Blue MR 

(Stratagene) using a Gigapackin XL packaging extract kit (Stratagene). 

This library was then probed with a radioactively-labeled probe generated by 

1 5 PCR from Streptomyces antibioticus DNA using primers complementary to known 

sequences of KS domains hypothesized to originate from extender modules 5 and 6 of 

the oleandolide PKS. This probing identified about 30 different colonies, which were 

pooled, replated, and probed again, resulting in the identification of 9 cosmids. These 

latter cosmids were isolated and transformed into the commercially available E. coli 



V 



20 strain XL-1 Blue. Plasmid DNA was isolated and analyzed by restriction enzyme 
digestion, which revealed that the entire PKS gene cluster was contained in 
overlapping segments on two of the cosmids identified. DNA sequence analysis using 
the T3 primer showed that the desired DNA had been isolated. 

Further analysis of these cosmids and subclones prepared from the cosmids 

25 facilitated the identification of the location of various oleandolide PKS ORFs, 
modules in those ORFs, and coding sequences for oleandomycin modification 
enzymes. The location of these genes and modules is shown on Figure 1. Figure 1 
shows that the complete oleandolide PKS gene cluster is contained within the insert 
DNA of cosmids pKOS055-l (insert size of -43 kb) and pKOS055-5 (insert size of 

30 -47 kb). Each of these cosmids has been deposited with the American Type Culture 
Collection in accordance with the terms of the Budapest Treaty (cosmid pKOS055-l 
is available under accession no. ATCC 203798; cosmid pKOS055-5 is available under 
accession no. ATCC 203799). Various additional reagents of the invention can 
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therefore be isolated from these cosmids. DNA sequence analysis was also performed 
on the various subclones of the invention, as described above. 

Example 3 

Expression of an Oleandolide/DEBS Hybrid PKS in Saccharopolyspora erythraea 
This Example describes the construction of an expression vector, plasmid 
pKOS039-l 10, that can integrate into the chromosome of Saccharopolyspora 
erythraea due to the phage phiC31 attachment and integration functions present on 
the plasmid and drive expression of the oleAI gene product under the control of the 
ermE* promoter. A restriction site and function map of plasmid pKOS039-l 10 is 
shown in Figure 3 of the accompanying drawings. The expression of the oleAI gene 
product in a host cell that naturally produces the eryA gene products results in the 
formation of a functional hybrid PKS of the present invention composed of the oleAI, 

eryAII, and eryAIIIgene products and the concomitant production of 13 -methyl 

- „ 

erythromycins. While the specific plasmids and vectors utilized in the construction are 
described herein, those of skill in the art will recognize that equivalent expression 
vectors of the invention can be readily constructed from publicly available materials 
and the oleA gene containing cosmids of the present invention deposited with the 
ATCC. 

Plasmid pKOS039-98 is a cloning vector that contains convenient restriction 
sites that was constructed by inserting a polylinker oligonucleotide, containing a 
restriction enzyme recognition site for Pad, a Shine-Dalgarno sequence, and 
restriction enzyme recognition sites for Ndel, BglU, and /f/ndlll, into a pUC19 
derivative, called pKOS24-47. Plasmid pKOS039-98 (see PCT patent application No. 
WO US99/1 1814, incorporated herein by reference) was digested with restriction 
enzymes Pad and EcoRl and ligated to a poly linker composed of the oligonucleotides 
N39-51 and N39-52 having the following sequence: 

N39-5 1 : S'-TAAGGAGGACCATATGCATCGCTCGAGTCTAGACCTAGGO' 
N39-52: S'-AATTCCTAGGTCTAGACTCGAGCGATGCATATGGTCCTCC- 
TTAAT-3\ which thus includes the following restriction enzyme recognition sites in 
the order shown: NdeVNsil-Xhol-XbaVEcdBl, to yield plasmid pKOS039-105. 

Plasmid pKOS039-105 was digested with restriction enzymes Nsil and EcoRI, 
and the resulting large fragment ligated to the 15.2 kb Nsil-EcoRl restriction fragment 
of cosmid pKOS055-5 containing the oleAI gene to yield plasmid pKOS039-l 16. 
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Plasmid pKOS039-l 16 was digested with restriction enzymes Ndel and EcoRl, and 
the resulting 15.2 kb fragment containing the oleAI gene was isolated and ligated to 
the 6 kb Ndel-EcoRl restriction fragment of plasmid pKOS039-134B to yield plasmid 
pKOS039- 110 (Figure 3). 
5 Plasmid pKOS039-134B is a derivative of pKOS039-104 described in PCT 

patent application No. WO US99/1 1814, supra, prepared by digesting the latter with 
restriction enzyme BgUl and ligating the -10.5 kb fragment to get pKOS39-104B. 
Plasmid pKOS39-104B was digested with restriction enzyme Pad and partially 
digested with restriction enzyme JCbaL The -7.4 kb fragment was ligated with 

10 PCR61 A+62 fragment treated with restriction enzymes Pad and AvrU. The 
PCR61 A+62 fragment was generated using the PCR primers: 
N39-61A, 5'-TTCCf AGGCTAGCCCGACCCGAGCACGCGCCGGCA-3'; and 
N39-62, 5'-CCTTAATTAAGGATCCTACCAACCGGCACGATTGTGCC-3', 
and the template was pWHMl 104 (Tang et al. 9 1996, Molecular Microbiology 22(5): 

15 801-813). 

Plasmid pKOS039-l 10 DNA was passed through E. coli ET cells to obtain 
non-methylated DNA, which was then used to transform Saccharopolyspora 
erythraea cells, which contain a mutation in the eryAI coding sequence for the KS 
domain of module 1 of DEBS that renders the PKS non-functional. The resulting 
20 transformants produced detectable amounts of 14-desmethyl erythromycins. 

I* *?" 

r 

Example 4 

Heterologous Expression of an Oleandolide PKS in Streptomyces lividans 
This Example describes the construction of an expression vector, plasmid 

25 pKOS039-130, that has an SCP2* origin of replication and so can replicate in 

Streptomyces host cells and drive expression of the oleAI, oleAII, and oleAIII gene 
products under the control of the act! promoter and actII-ORF4 activator. A 
restriction site and function map of plasmid pKOS039-130 is shown in Figure 4 of the 
accompanying drawings. The expression of the oleA gene products in this host cell 

30 results in the formation of a functional oleandolide PKS composed of the oleAI, 
oleAII, and oleAIII gene products and the concomitant production of 8,8a- 
deoxyoleandolide. While the specific plasmids and vectors utilized in the construction 
are described herein, those of skill in the art will recognize that equivalent expression 
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vectors of the invention can be readily constructed from publicly available materials 
and the oleA gene containing cosmids of the present invention deposited with the 
ATCC. 

The 7.2 kb Nsil-Xhol restriction fragment of cosmid pKOS055-5 was cloned 
5 into pKOS39-105 to give plasmid pKOS039-106. The 8.0 kb Xhol-Pstl restriction 
fragment of cosmid pKOS055-5 was cloned into commercially available plasmid 
pLitmus28 to yield plasmid pKOS039-107. The 14 kb EcoW-EcoRV and 5.4 kb 
EcoRV-Pstl restriction fragments of cosmid pKOS055-l were ligated with pLitmus28 
digested with EcoRI and Pstl to yield plasmid pKOS039-l 15. The 19.5 kb Spel-Xbal 

1 0 restriction fragment from plasmid pKOS039-l 1 5 was inserted into pKOS039-73, a 
derivative of plasmid pRM5, to yield plasmid pKOS039-129. The 15.2 kb Pad- 
EcoRI restriction fragment of plasmid pKOS039- 1 1 0 was inserted into pKOS03 9-129 
by replacing the 22 kb Pad - EcoRI restriction fragment to yield plasmid pKOS038- 
174. The 19 kb EcoRI restriction fragment from plasmid pKOS039- 129 was then 

1 5 inserted into pKOS03 8- 1 74 to yield plasmid pKOS039- 130 (Figure 4), which was 

used to transform Streptomyces lividans K4-1 14 (K4-155 could also be used). The 

resulting transformants produced 8,8a-deoxyoleandolide. 

As noted above, the invention provides a recombinant oleAI gene in which the 
.„ ~ ~ . . . , - ^ > <i » ' •■ 

coding sequence for the KS domain of module 1 has been mutated to change the 

20 active site cysteine to another amino acid (theJKSl 0 mutation). Recombinant PKS 
enzymes comprising this gene product do not produce a polyketide unless provided 
with diketide (or triketide) compounds that can bind to the KS2 or KS3 domain, 
where they are then processed to form a polyketide comprising the diketide (or 
triketide). This recombinant oleAI gene can be used together with the oleAII and 

25 oleAII genes to make a recombinant oleandolide PKS or can be used with modified 
forms of those genes or other naturally occurring or recombinant PKS genes to make 
a hybrid PKS. 

To make the KS1° mutation in oleAI, the following primers were prepared: 
N39-47, 5 '-GCGAATTCCCGGGTGGCGTGACCTCT; 
30 N39-48, 5 '-GAGCTAGCCGCCGTGTCCACCGTGACC; 

N39-49, S'-CGGCTAGCTCGTCGCTGGTGGCACTGCAC; and 
N39-50, 5 '-CGAAGCTTGACCAGGAAAGACGAACACC. 
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These primers were used to amplify template DNA prepared from pKOS039-106. The 
amplification product of primers N39-47 and N39-48 was digested with restriction 
enzymes EcoTU and Nhel, and the amplification product of primers N39-49 and N39- 
50 was digested with restriction enzymes Nhel and Hindlll, and the resulting 
5 , restriction fragments were ligated to EcoBA and //mdlll-digested plasmid pLitmus28 
to yield plasmid pKOS038-179. The 1.5 kb BsrGI-BbvCl restriction fragment of 
plasmid pKOS038-179 was inserted into plasmid pKOS039-106 to yield pKOS098-2. 
The 7 kb Nsil - Xhol restriction fragment of plasmid pKOS098-2 and the 8 kb Xhol - 
EcoRl restriction fragments of plasmid pKOS039-107 are then used to replace the 
10 15.2 kb Nsil - EcoBl restriction fragment of plasmid pKOS039-l 10 to yield the 
desired expression vector, pKOS039-l 10-KS1°, which comprises the oleAI KS1° 
gene under the cental of ft. ^pronSr. " ' " 

& • ;* 

To provide an expression vector of the invention that encodes the complete 

T l 

oleandolide PKS with the recombinant oleAIKSl 0 gene product, the oleAI KS\° gene 
1 5 can be isolated as a Pad - EcoRI restriction fragment from plasmid pKOS039-l 10- 
KS1°, which is then used to construct an expression vector analogous to the 
expression vector plasmid pKOS039-130 in the same manner in which the latter 
vector was constructed. The resulting expression vector can be used in Streptomyces 
lividans, 5. coelicolor, and other compatible host cells to make polyketides by 



20 diketide feeding as described in PCT patent publication No. WO 99/03986, 
incorporated herein by reference. 

Example 5 

Expression of an Oleandomycin/Picromycin Hybrid PKS 
25 This Example describes the construction of an expression vector, plasmid 

pKOS039-133, that can integrate into the chromosome of Streptomyces due to the 
phage phiC31 attachment and integration functions present on the plasmid and drive 
expression of the oleAIII gene product under the control of the actl promoter and 
actII-ORF4 activator. A restriction site and function map of plasmid pKOS039-133 is 
30 shown in Figure 5 of the accompanying drawings. This plasmid was introduced into 
S. lividans host cells together with a plasmid, pKOS039-83, that drives expression of 
the narbonolide PKS genes picAI and picAII (see PCT patent application No. WO 
US99/1 1814, supra). The expression of the oleAIII and picAI and pic All gene 
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products in a host cell results in the formation of a functional hybrid PKS of the 
present invention composed of the oleAIII, picAI, and picAII gene products and the 
concomitant production of 3-hydroxy-narbonolide. While the specific plasmids and 
vectors utilized in the construction are described herein, those of skill in the art will 
5 recognize that equivalent expression vectors of the invention can be readily 

constructed from publicly available materials and the oleA gene containing cosmids of 
the present invention deposited with the ATCC. 

Two oligonucleotides were prepared for the insertion of the oleAIII gene into 
pSET152 derivative plasmid pKOS039-42: 
10 N39-59, 5 '-AATTCATATGGCTGAGGCGGAGAAGCTGCGCGAATACC- 
TGTGG; and 

N39-60, 5 *-CGCGCCACAGGtATTCGCGC AGCTTCTCCGCCTC AGCCATATG. 
Plasmid pKOS039-l 1 5 was digested with restriction enzymes EcoRI and Ascl to give 
the -1 3.8 kb restriction fragment, which was inserted with the linker N39-59/N39-60 

1 5 to yield plasmid pKOS039-132. Plasmid pKOS039-l 32 was digested with restriction 
enzymes Ndel and Xbal to give the -10.8 kb restriction fragment, which was ligated 
to the -9 kb Ndel-Spel restriction fragment of plasmid^pKOS039-42 to yield plasmid 
pKOS039-133 (Figure 5). Plasmid pKOS039-133 an(fpKOS03*9-83 were co- 
"transformed into Streptomyces lividans K4-1 14 (K4-155 can also be used; see 

20 Ziermann and Betlach, 1999, Biotechniques 26, 106-H0, and U.S. patent application 
Serial No. 09/181,833, filed 28 Oct 1998, each of which is incorporated herein by 
reference). Protoplasts were transformed using standard procedures and transformants 
selected using overlays containing antibiotics. The strains were grown in liquid R5 
medium (with 20 ng/mL thiostrepton, see Hopwood et al. 9 Genetic Manipulation of 

25 Streptomyces: A Laboratory Manual; John Innes Foundation: Norwich, UK, 1985, 
incorporated herein by reference) for growth/seed and production cultures at 30°C. 
Analysis of extracts by LC/MS established the identity of the polyketide as the 
expected compound, 3-hydroxynarbonolide. 



30 Example 6 

Conversion of Erythronolides to Erythromycins 
A sample of an oleandolide (-50 to 1 00 mg) is dissolved in 0.6 mL of ethanol 
and diluted to 3 mL with sterile water. This solution is used to overlay a three day old 
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culture of Saccharopolyspora erythraea WHM34 (an eryA mutant) grown on a 100 
mm R2YE agar plate at 30°C. After drying, the plate is incubated at 30°C for four 
days. The agar is chopped and then extracted three times with 100 mL portions of 1% 
triethylamine in ethyl acetate. The extracts are combined and evaporated. The crude 

5 product is purified by preparative HPLC (C- 1 8 reversed phase, water-acetonitrile 
gradient containing 1% acetic acid). Fractions are analyzed by mass spectrometry, and 
those containing pure compound are pooled, neutralized with triethylamine, and 
evaporated to a syrup. The syrup is dissolved in water and extracted three times with 
equal volumes of ethyl acetate. The organic extracts are combined, washed once with 

10 saturated aqueous NaHC03, dried over Na2S04, filtered, and evaporated to yield 
-0.15 mg of product. The product is a glycosylated and hydroxylated oleandolide 
corresponding to erythromycin A, B, C, and D but differing therefrom as the 
oleandolide provided differed from 6-dEB. 

..... -.- t-' ' 

15 Example 7 

Measurement of Antibacterial Activity 
Antibacterial activity is determined using either disk diffusion assays with 
Bacillus cereus as the test organism or by measurement of minimum inhibitory 
concentrations (MIC) in liquid culture against sensitive and resistant strams or 
20 Staphylococcus pneumoniae. 

The invention having now been described by way of written description and 
example, those of skill in the art will recognize that the invention can be practiced in a 
variety of embodiments and that the foregoing description and examples are for 
25 purposes of illustration and not limitation of the following claims. 
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Claims 

1 . An isolated recombinant DNA compound that comprises a coding 
sequence for a domain of a loading module or any one of extender modules one 
through four, inclusive of an oleandolide polyketide synthase (PKS). 

2. The isolated recombinant DNA compound of Claim 1 , wherein said 
domain is selected from the group consisting of a thioesterase domain, a KS Q domain, 
an AT domain, a KS domain, an ACP domain, a KR domain, a DH domain, and an 
ER domain. 



-s 



3. The isolated recombinant DNA compound of Claim 2 that comprises 
— * - \— • — »j— - • 

the coding sequence for a loading module and extender modules one and two of the 

* «-.+.• 

oleandolide PKS. 

4* * . 

7 w. 

1 5 4. The isolated recombinant DNA compound of Claim 2 that comprises 

the coding sequence for the loading module and all s& extender modules, 

5. An isolated recombinant DNA compound that comprises a coding 
sequence for a domain of a loading module or any one of extender modules one 
20 through six, inclusive of an oleandolide polyketide synthase (PKS) operably linked to 

r 

a promoter. 



6. The isolated recombinant DNA compound of Claim 5, wherein said 
coding sequence encodes a loading module or any one of extender modules one 

25 through four, inclusive, of oleandolide PJKS. 

7. The isolated recombinant DNA compound of Claim 5 that is a 
recombinant DNA expression vector that further comprises an origin of replication or 
a segment of DNA that enables chromosomal integration. 



8. The recombinant DNA expression vector of Claim 7 that codes for 
expression of a PKS in Streptomyces host cells. 
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9. A recombinant host cell selected from the group consisting of 
Streptomyces host cells and Saccharopolyspora host cells that comprises a 
recombinant DNA expression vector of Claim 7. 

5 10. The recombinant DNA expression vector of Claim 7 that encodes a 

hybrid PKS comprising at least a portion of an oleandolide PKS gene and at least a 
portion of a second PKS gene for a macrolide aglycone other than oleandolide. 

1 1 . The recombinant DNA compound of Claim 10, wherein said second 
1 0 PKS gene is a DEBS gene. 



12. The recombinant DNA compound of Claim 1 1 , wherein said hybrid 
PKS comprises a loading module and any one of extender modules one through four, 
inclusive, of oleandolide PKS and an extender module of DEBS . 

15 

1 3 . The recombinant DNA compound of Claim 1 0, wherein said hybrid 
PKS comprises a loading module and any one of extender modules one through four, 
inclusive, of oleandolide PKS and an extender module of narbonolide PKS. 



20 14. A recombinant host cell, which in its untransformed state does not 

produce oleandolide, that comprises a recombinant DNA expression vector of Claim 
1 1 and said cell produces a macrolide aglycone synthesized by said hybrid PKS. 



15. The recombinant host cell of Claim 14 that is Streptomyces lividans. 

25 

16. The recombinant host cell of Claim 14 that is Saccharopolyspora 
erythraea. 



17. The recombinant host cell of Claim 13, wherein said oleandolide PKS 
30 has a non-functional KS domain in extender module one. 



18. The recombinant host cell of Claim 17 that is Streptomyces coelicolor 
or Streptomyces lividans. 
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1 9. The recombinant host cell of Claim 1 7 that is Saccharopolyspora 
erythraea. 

20. A method for producing a polyketide in a cell, which method 
comprises transforming the cell with a recombinant expression vector that encodes at 
least a portion of an oleAJ, oleAII, or oleAIII gene. 
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